2 Preliminaries

Introduction

This file establishes the foundational mathematical definitions and results that every subsequent SOM file depends upon. It defines continuous and discrete phase-type distributions, their moments and transforms, reward-transformed and multivariate extensions, and the weighted directed graph representation used throughout phasic. No algorithms or pseudocode appear here; those begin in later files (graph elimination, reward transformation, and beyond).

The definitions follow Bladt and Nielsen (2017) for the matrix-analytic foundations, Hobolth et al. (2024) for multivariate cross-moments, and Røikjer et al. (2022) for the graph representation. All notation conforms to docs/notation_standard.md.

Prerequisites: None.

Source files: - api/c/phasic.h (struct definitions: ptd_graph, ptd_vertex, ptd_edge) - src/c/phasic.c (functions: ptd_defect, ptd_graph_pdf, ptd_dph_pmf)

Definitions

Continuous-time foundations

Phase-type distributions arise as absorption times in Markov processes on finite state spaces. We begin by defining the underlying stochastic process, then derive the distribution of its absorption time.

Definition 2.1 (Continuous-time Markov jump process) A continuous-time Markov jump process on the finite state space \{1, 2, \ldots, p, p+1\} is a stochastic process \{X_t\}_{t \geq 0} characterized by a (p+1) \times (p+1) intensity matrix \boldsymbol{\Lambda} and an initial distribution. The process satisfies: (i) given X_t = i, the holding time in state i is exponentially distributed with rate -\Lambda_{ii}; (ii) upon leaving state i, the process jumps to state j \neq i with probability \Lambda_{ij} / (-\Lambda_{ii}); (iii) the process is time-homogeneous and satisfies the Markov property.

Intuition

The process lives in one of finitely many states. It waits an exponential time in each state, then jumps to another state with probabilities determined by the intensity matrix. The special state p+1 will serve as the absorbing state.

Example

A two-state process with states \{1, 2\} where state 2 is absorbing. The intensity matrix is \boldsymbol{\Lambda} = \begin{pmatrix} -\lambda & \lambda \\ 0 & 0 \end{pmatrix}. Starting in state 1, the process waits an \operatorname{Exp}(\lambda) time and then enters state 2 permanently.

Definition 2.2 (Continuous phase-type distribution) Let \{X_t\}_{t \geq 0} be a continuous-time Markov jump process on \{1, 2, \ldots, p, p+1\}, where states 1, \ldots, p are transient and state p+1 is absorbing. Let \boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_p) be the row vector of initial probabilities on the transient states, so that \alpha_i = \Pr(X_0 = i) for i = 1, \ldots, p. Define the defect \alpha_0 = 1 - \sum_{i=1}^{p} \alpha_i \geq 0, representing the probability of starting in the absorbing state. Let \mathbf{S} be the p \times p sub-intensity matrix governing transitions among transient states: off-diagonal entries s_{ij} \geq 0 for i \neq j, diagonal entries s_{ii} < 0, and row sums non-positive. Define the exit rate vector \mathbf{s} = -\mathbf{S}\mathbf{e}, where \mathbf{e} is the column vector of ones. The absorption time \tau = \inf\{t \geq 0 : X_t = p+1\} is said to follow a continuous phase-type distribution of order p, written \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) (Neuts 1981).

Intuition

The sub-intensity matrix \mathbf{S} captures all dynamics among the transient states. The exit rate vector \mathbf{s} captures absorption: the i-th entry s_i = -\sum_{j=1}^{p} s_{ij} \geq 0 is the rate at which state i transitions to the absorbing state. When \alpha_0 > 0, the distribution has a point mass at \tau = 0.

Example

An Erlang distribution \operatorname{Erlang}(k, \lambda) is \operatorname{PH}_k(\boldsymbol{\alpha}, \mathbf{S}) with \boldsymbol{\alpha} = (1, 0, \ldots, 0) and \mathbf{S} a k \times k bi-diagonal matrix with s_{ii} = -\lambda and s_{i,i+1} = \lambda for i = 1, \ldots, k-1.

Definition 2.3 (PDF, CDF, and survival function of CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}). The probability density function, cumulative distribution function, and survival function are

f_\tau(t) = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{s}, \quad t \geq 0, \tag{2.1}

F_\tau(t) = 1 - \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{e}, \quad t \geq 0, \tag{2.2}

\bar{F}_\tau(t) = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{e}, \quad t \geq 0. \tag{2.3}

When \alpha_0 > 0, the distribution has a point mass \Pr(\tau = 0) = \alpha_0 at the origin (Neuts 1981; Bladt and Nielsen 2017).

Intuition

The matrix exponential e^{\mathbf{S}t} propagates the initial distribution forward in time over the transient states. The entry (e^{\mathbf{S}t})_{ij} gives the probability of being in transient state j at time t given start in state i (conditional on not yet being absorbed). Multiplying by \mathbf{s} extracts the instantaneous absorption rate.

Definition 2.4 (Moments of CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let \mathbf{U} = (-\mathbf{S})^{-1} be the Green matrix. The n-th moment of \tau is

\mathbb{E}[\tau^n] = n! \, \boldsymbol{\alpha} (-\mathbf{S})^{-n} \mathbf{e} = n! \, \boldsymbol{\alpha} \mathbf{U}^n \mathbf{e}. \tag{2.4}

Intuition

The Green matrix entry U_{ij} equals \mathbb{E}[Z_j \mid X_0 = i], the expected total time spent in transient state j before absorption, given start in state i. The first moment \mathbb{E}[\tau] = \boldsymbol{\alpha} \mathbf{U} \mathbf{e} is the expected total time across all transient states, weighted by the initial distribution.

Example

For \operatorname{Exp}(\lambda): \mathbf{S} = (-\lambda), \mathbf{U} = (1/\lambda), \boldsymbol{\alpha} = (1). Then \mathbb{E}[\tau] = 1 \cdot (1/\lambda) \cdot 1 = 1/\lambda and \mathbb{E}[\tau^2] = 2/\lambda^2.

Definition 2.5 (Laplace transform of CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}). The Laplace transform of \tau is

\mathcal{L}_\tau(u) = \mathbb{E}[e^{-u\tau}] = \boldsymbol{\alpha}(u\mathbf{I} - \mathbf{S})^{-1} \mathbf{s}, \quad u \geq 0. \tag{2.5}

Intuition

The Laplace transform encodes the entire distribution in the frequency domain. It is the generating function for the moments: (-1)^n \frac{d^n}{du^n} \mathcal{L}_\tau(u) \big|_{u=0} = \mathbb{E}[\tau^n]. The matrix (u\mathbf{I} - \mathbf{S})^{-1} is the resolvent of \mathbf{S}.

Discrete-time foundations

The discrete-time analogue replaces exponential holding times with geometric holding times and continuous time with a discrete step counter.

Definition 2.6 (Discrete-time Markov chain on finite state space) A discrete-time Markov chain on the finite state space \{1, 2, \ldots, p, p+1\} is a stochastic process \{X_t\}_{t \in \mathbb{N}_0} satisfying the Markov property, characterized by a (p+1) \times (p+1) transition matrix and an initial distribution.

Intuition

At each discrete time step, the chain transitions from its current state to a new state (possibly the same state) according to fixed transition probabilities.

Definition 2.7 (Discrete phase-type distribution) Let \{X_t\}_{t \in \mathbb{N}_0} be a discrete-time Markov chain on \{1, 2, \ldots, p, p+1\} with transient states 1, \ldots, p and absorbing state p+1. Let \boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_p) be the initial distribution on transient states, let \mathbf{T} be the p \times p sub-transition matrix with entries t_{ij} \geq 0 and row sums at most 1, and let \mathbf{t} = \mathbf{e} - \mathbf{T}\mathbf{e} be the exit probability vector. The number of steps until absorption \tau = \inf\{t \geq 1 : X_t = p+1\} follows a discrete phase-type distribution, written \tau \sim \operatorname{DPH}_p(\boldsymbol{\alpha}, \mathbf{T}) (Neuts 1981).

Intuition

The sub-transition matrix \mathbf{T} plays the role of \mathbf{S} in the continuous case, but entries are probabilities rather than rates. The exit probability t_i = 1 - \sum_j T_{ij} is the probability of absorbing from state i at the next step.

Example

A geometric distribution \operatorname{Geo}(1-q) (number of trials until the first success) is \operatorname{DPH}_1((1), (q)) with \mathbf{T} = (q) and \mathbf{t} = (1-q).

Definition 2.8 (PMF and moments of DPH) Let \tau \sim \operatorname{DPH}_p(\boldsymbol{\alpha}, \mathbf{T}). The probability mass function is

\Pr(\tau = k) = \boldsymbol{\alpha} \mathbf{T}^{k-1} \mathbf{t}, \quad k = 1, 2, \ldots \tag{2.6}

The first moment is

\mathbb{E}[\tau] = \boldsymbol{\alpha}(\mathbf{I} - \mathbf{T})^{-1}\mathbf{e}. \tag{2.7}

More generally, the n-th factorial moment is \mathbb{E}[\tau(\tau-1)\cdots(\tau-n+1)] = n! \, \boldsymbol{\alpha} (\mathbf{I} - \mathbf{T})^{-n} \mathbf{T}^{n-1} \mathbf{e}.

Intuition

The PMF follows the same logic as the continuous PDF: \mathbf{T}^{k-1} propagates the initial distribution through k-1 steps among transient states, then \mathbf{t} extracts the absorption probability. The matrix (\mathbf{I} - \mathbf{T})^{-1} is the discrete analogue of the Green matrix \mathbf{U}.

Reward transformations

Reward-transformed phase-type distributions generalize the basic PH family by assigning non-negative rewards to transient states and measuring accumulated reward rather than raw time.

Definition 2.9 (Reward-transformed CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let \mathbf{r} = (r_1, \ldots, r_p)^\top be a reward vector with r_i \geq 0 for all i. Let \triangle(\mathbf{r}) denote the p \times p diagonal matrix with \mathbf{r} on the diagonal. The reward-transformed random variable is

\tilde{\tau} = \int_0^{\tau} r(X_t) \, dt, \tag{2.8}

where r(X_t) = r_i when X_t = i. If all rewards are strictly positive (r_i > 0 for all i), then \tilde{\tau} is itself phase-type distributed (Bladt and Nielsen 2017):

\tilde{\tau} \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \triangle(\mathbf{r})^{-1}\mathbf{S}). \tag{2.9}

The moments of the reward-transformed distribution are

\mathbb{E}[\tilde{\tau}^n] = n! \, \boldsymbol{\alpha} \left( (-\mathbf{S})^{-1} \triangle(\mathbf{r}) \right)^n \mathbf{e} = n! \, \boldsymbol{\alpha} \left( \mathbf{U} \triangle(\mathbf{r}) \right)^n \mathbf{e}. \tag{2.10}

Intuition

A reward of r_i assigned to state i means that while the process occupies state i, reward accumulates at rate r_i per unit time. The total accumulated reward until absorption generalizes the absorption time (which corresponds to all rewards equal to 1). The closure property (Equation 2.9) is fundamental: the class of phase-type distributions is closed under positive reward transformation. The transformed sub-intensity matrix \triangle(\mathbf{r})^{-1}\mathbf{S} rescales the rates in each state by the inverse of the reward.

Example

Let \tau \sim \operatorname{Exp}(\lambda) (a single transient state with rate \lambda) and r_1 = c > 0. Then \tilde{\tau} = c \cdot \tau \sim \operatorname{Exp}(\lambda/c), which is \operatorname{PH}_1((1), (-\lambda/c)). This is the standard scaling property of the exponential distribution.

Multivariate phase-type distributions

When multiple reward vectors are applied simultaneously, the resulting collection of accumulated rewards forms a multivariate phase-type distribution.

Definition 2.10 (Multivariate phase-type distribution) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let \mathbf{R} be a p \times m reward matrix whose column j, denoted \mathbf{R}_{\cdot j}, is a reward vector for marginal j. The random vector \mathbf{Y} = (Y_1, \ldots, Y_m) defined by

Y_j = \int_0^{\tau} R_{X_t, j} \, dt, \quad j = 1, \ldots, m, \tag{2.11}

follows a multivariate phase-type distribution, written \mathbf{Y} \sim \operatorname{MPH}(\boldsymbol{\alpha}, \mathbf{S}, \mathbf{R}) (Kulkarni 1989; Bladt and Nielsen 2017).

Intuition

Each marginal Y_j is a reward-transformed variable corresponding to column j of \mathbf{R}. The marginals share the same underlying Markov chain and hence are dependent. The joint distribution captures the correlation structure induced by the shared chain.

Example

In population genetics, one might track both the total branch length (r_i = 1 for all i) and the branch length subtending a particular clade (r_i = 1 for states where the clade is present, r_i = 0 otherwise). The two accumulated quantities form a bivariate phase-type distribution.

Definition 2.11 (Cross-moments of MPH) Let \mathbf{Y} \sim \operatorname{MPH}(\boldsymbol{\alpha}, \mathbf{S}, \mathbf{R}) with Green matrix \mathbf{U} = (-\mathbf{S})^{-1}. The second-order cross-moment between marginals i and j is

\mathbb{E}[Y_i Y_j] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{e} + \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{e}. \tag{2.12}

Intuition

The cross-moment decomposes into two terms corresponding to the two orderings of reward accumulation: first accumulating \mathbf{R}_{\cdot i} then \mathbf{R}_{\cdot j}, and vice versa. This symmetry arises from a first-step analysis of the occupation measures. The formula reduces to \mathbb{E}[Y_i^2] = 2 \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{e} when i = j, consistent with the second moment formula (Equation 2.10 with n = 2).

Graph representation

The graph representation is the central data structure of phasic. It encodes phase-type distributions in a form that enables graph-based algorithms to operate in O(n + m) space rather than the O(n^2) required by dense matrices.

Definition 2.12 (Weighted directed graph) A weighted directed graph is a triple G = (V, E, W) where V is a finite set of vertices, E \subseteq V \times V is a set of directed edges, and W \colon E \to \mathbb{R} is a weight function. We write (v \to z) \in E for a directed edge from v to z and w(v \to z) for its weight. The children and parents of v are \operatorname{children}(v) = \{z \in V : (v \to z) \in E\} and \operatorname{parents}(v) = \{u \in V : (u \to v) \in E\}. The total outgoing rate of v is \lambda_v = \sum_{z \in \operatorname{children}(v)} w(v \to z).

Intuition

The graph is a sparse representation of the generator (for CPH) or transition matrix (for DPH). Only edges with non-zero weight are stored, which is why the representation is efficient for sparse Markov chains.

Definition 2.13 (Phase-type graph structure) A phase-type graph is a weighted directed graph G = (V, E, W) with the following distinguished structure:

A starting vertex v_0 \in V whose outgoing edges encode the initial distribution: w(v_0 \to v_i) = \alpha_i for each transient state i with \alpha_i > 0, and \lambda_{v_0} \leq 1.
Transient vertices v_1, \ldots, v_p \in V, one per transient state. The edge weights between transient vertices encode off-diagonal entries of the sub-intensity matrix (CPH) or sub-transition matrix (DPH).
Absorbing vertices: vertices v \in V with \operatorname{children}(v) = \emptyset.

The starting vertex v_0 is assigned reward zero. Each transient vertex v_i may be assigned a non-negative reward r_{v_i} \geq 0.

Intuition

The starting vertex v_0 acts as a “dispatcher” that distributes probability mass to the transient states according to \boldsymbol{\alpha}. It is not itself a transient state of the Markov chain. Absorbing vertices are sinks with no outgoing edges; once probability mass reaches them, it is permanently captured.

Example

An \operatorname{Exp}(\lambda) distribution is represented by three vertices: v_0 (starting), v_1 (transient), and v_a (absorbing), with edges w(v_0 \to v_1) = 1 and w(v_1 \to v_a) = \lambda.

Definition 2.14 (Graph-matrix correspondence) Given a phase-type graph G with starting vertex v_0 and transient vertices v_1, \ldots, v_p, define \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) by:

\alpha_i = w(v_0 \to v_i) for i = 1, \ldots, p (zero if no such edge exists);
s_{ij} = w(v_i \to v_j) for i \neq j (zero if no such edge exists);
s_{ii} = -\lambda_{v_i} (negative of the total outgoing rate from vertex v_i).

Conversely, given \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}), construct G by creating v_0, vertices v_1, \ldots, v_p, absorbing vertices as needed, and edges with the weights above.

Intuition

This correspondence is the foundation of phasic: every matrix computation on (\boldsymbol{\alpha}, \mathbf{S}) has a graph counterpart on G. The diagonal entries of \mathbf{S} are not stored as self-loops; instead, they are implicitly determined by the total outgoing rate \lambda_{v_i} = \sum_{j \neq i} s_{ij} + s_i, where s_i is the exit rate to the absorbing state.

Parameterized graphs

In many applications, edge weights depend on unknown parameters. Parameterized edges formalize this dependence.

Definition 2.15 (Parameterized edges) Let \boldsymbol{\theta} = (\theta_1, \ldots, \theta_k) be a parameter vector. A parameterized edge (v \to z) carries a coefficient vector \mathbf{c} = (c_1, \ldots, c_k) and computes its weight according to a weight mode:

Mode	Weight computation
linear	w(v \to z) = \sum_{j=1}^{k} c_j \theta_j
log	w(v \to z) = \prod_{j=1}^{k} (c_j \theta_j)
callback	w(v \to z) = \phi(\boldsymbol{\theta}, \mathbf{c}) for a user-specified function \phi

In linear mode, the weight is an affine combination of the parameters (affine because the coefficients c_j can encode constant terms). In log mode, the weight is a product computed in log-space for numerical stability: \log w = \sum_j \log(c_j \theta_j). In callback mode, the weight is the output of an arbitrary function \phi.

Intuition

Parameterized edges allow phasic to represent families of phase-type distributions indexed by \boldsymbol{\theta}. This is essential for inference: given observed data, one seeks the parameter vector \boldsymbol{\theta} that best explains the data under the model encoded by the parameterized graph. The three weight modes cover additive rate models (linear), multiplicative factor models (log), and fully general nonlinear models (callback).

Definition 2.16 (Vertex scalar and normalized graph) Let G = (V, E, W) be a phase-type graph. The vertex scalar of v \in V is

x_v = \frac{1}{\lambda_v}, \tag{2.13}

where \lambda_v = \sum_{z \in \operatorname{children}(v)} w(v \to z) is the total outgoing rate. A normalized graph is a graph in which each edge weight w(v \to z) has been replaced by the transition probability w'(v \to z) = w(v \to z) / \lambda_v. After normalization, \lambda_v = 1 for all transient vertices, and x_v stores the original total outgoing rate as the expected waiting time.

Intuition

For a continuous phase-type distribution, x_v = 1/\lambda_v is the expected time spent during each visit to state v (the mean of the exponential holding time). Normalization separates the graph into two components: the embedded discrete-time Markov chain (the normalized edge weights, which are transition probabilities) and the holding time structure (the vertex scalars). This separation is exploited by graph elimination and moment computation algorithms in later files.

Theorems and Proofs

Theorem 2.1 (PDF of a continuous phase-type distribution) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}). Then

f_\tau(t) = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{s}, \quad t > 0.

Proof. Let \mathbf{p}(t) = (\Pr(X_t = 1), \ldots, \Pr(X_t = p)) be the row vector of state probabilities at time t, restricted to the transient states. The Kolmogorov forward equation for the transient states gives

\frac{d}{dt} \mathbf{p}(t) = \mathbf{p}(t) \mathbf{S},

with initial condition \mathbf{p}(0) = \boldsymbol{\alpha}. The unique solution is \mathbf{p}(t) = \boldsymbol{\alpha} e^{\mathbf{S}t}.

The survival function is the probability of remaining in the transient states:

\bar{F}_\tau(t) = \Pr(\tau > t) = \mathbf{p}(t) \mathbf{e} = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{e}.

The PDF is obtained by differentiating the CDF:

f_\tau(t) = -\frac{d}{dt} \bar{F}_\tau(t) = -\boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{S} \mathbf{e} = \boldsymbol{\alpha} e^{\mathbf{S}t} (-\mathbf{S}\mathbf{e}) = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{s},

where we used the identity \frac{d}{dt} e^{\mathbf{S}t} = e^{\mathbf{S}t} \mathbf{S} = \mathbf{S} e^{\mathbf{S}t} and the definition \mathbf{s} = -\mathbf{S}\mathbf{e}. \square

Theorem 2.2 (Moment formula for CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}). Then for all n \geq 1,

\mathbb{E}[\tau^n] = n! \, \boldsymbol{\alpha} (-\mathbf{S})^{-n} \mathbf{e}.

Proof. The Laplace transform of \tau is

\mathcal{L}_\tau(u) = \mathbb{E}[e^{-u\tau}] = \int_0^\infty e^{-ut} \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{s} \, dt = \boldsymbol{\alpha} \left( \int_0^\infty e^{(\mathbf{S} - u\mathbf{I})t} \, dt \right) \mathbf{s}.

Since all eigenvalues of \mathbf{S} have strictly negative real parts (the transient states are indeed transient), the integral converges for u \geq 0:

\int_0^\infty e^{(\mathbf{S} - u\mathbf{I})t} \, dt = -({\mathbf{S} - u\mathbf{I}})^{-1} = (u\mathbf{I} - \mathbf{S})^{-1}.

Thus \mathcal{L}_\tau(u) = \boldsymbol{\alpha}(u\mathbf{I} - \mathbf{S})^{-1}\mathbf{s}.

The n-th moment is recovered by differentiation:

\mathbb{E}[\tau^n] = (-1)^n \frac{d^n}{du^n} \mathcal{L}_\tau(u) \bigg|_{u=0}.

We compute \frac{d^n}{du^n} (u\mathbf{I} - \mathbf{S})^{-1} = (-1)^n n! \, (u\mathbf{I} - \mathbf{S})^{-(n+1)}, which follows by induction using the identity \frac{d}{du}(u\mathbf{I} - \mathbf{S})^{-1} = -(u\mathbf{I} - \mathbf{S})^{-2}. Evaluating at u = 0:

(-1)^n \cdot (-1)^n n! \, (-\mathbf{S})^{-(n+1)} \mathbf{s} = n! \, (-\mathbf{S})^{-(n+1)} \mathbf{s}.

Since \mathbf{s} = -\mathbf{S}\mathbf{e}, we obtain (-\mathbf{S})^{-(n+1)} \cdot (-\mathbf{S}) \cdot \mathbf{e} \cdot (-1) = (-\mathbf{S})^{-(n+1)} \cdot (-\mathbf{S}) \cdot \mathbf{e} \cdot (-1). More carefully: \mathbf{s} = -\mathbf{S}\mathbf{e} = (-\mathbf{S})\mathbf{e}, so

n! \, \boldsymbol{\alpha} (-\mathbf{S})^{-(n+1)} (-\mathbf{S}) \mathbf{e} = n! \, \boldsymbol{\alpha} (-\mathbf{S})^{-n} \mathbf{e}. \quad \square

Theorem 2.3 (Closure under reward transformation) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let \mathbf{r} = (r_1, \ldots, r_p)^\top with r_i > 0 for all i. Then the reward-transformed variable \tilde{\tau} = \int_0^{\tau} r(X_t) \, dt satisfies

\tilde{\tau} \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \triangle(\mathbf{r})^{-1}\mathbf{S}).

Proof. Define a time change by \tilde{t}(t) = \int_0^t r(X_s) \, ds. Since r_i > 0 for all i, this is strictly increasing almost surely, and hence invertible. Let t(\tilde{t}) denote the inverse. Define the time-changed process \tilde{X}_{\tilde{t}} = X_{t(\tilde{t})}.

We verify that \tilde{X} is a Markov jump process with sub-intensity matrix \triangle(\mathbf{r})^{-1}\mathbf{S}. When \tilde{X}_{\tilde{t}} = i, the original process X is in state i, which has holding rate -s_{ii} (i.e., X leaves state i at rate -s_{ii}). In the time-changed clock, time runs at rate r_i relative to the original clock, so the holding rate in the new clock is -s_{ii}/r_i. Similarly, the transition rate from i to j in the new clock is s_{ij}/r_i. The sub-intensity matrix of the time-changed process is therefore \triangle(\mathbf{r})^{-1}\mathbf{S}.

The initial distribution is unchanged because the time change does not affect the initial state. The absorption time in the new clock is \tilde{\tau} = \tilde{t}(\tau) = \int_0^\tau r(X_s) \, ds, which is the reward-transformed variable. Therefore \tilde{\tau} \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \triangle(\mathbf{r})^{-1}\mathbf{S}). \square

Theorem 2.4 (Cross-moment formula for MPH) Let \mathbf{Y} \sim \operatorname{MPH}(\boldsymbol{\alpha}, \mathbf{S}, \mathbf{R}) with Green matrix \mathbf{U} = (-\mathbf{S})^{-1}. Then

Proof. We follow the first-step analysis approach of Hobolth et al. (2024). Write Y_i = \int_0^\tau R_{X_t, i} \, dt and Y_j = \int_0^\tau R_{X_t, j} \, dt. Then

Y_i Y_j = \int_0^\tau \int_0^\tau R_{X_s, i} \, R_{X_t, j} \, ds \, dt.

Split the double integral over the square [0, \tau]^2 into the two triangles \{s < t\} and \{s > t\} (the diagonal \{s = t\} has measure zero):

Y_i Y_j = \int_0^\tau \int_0^t R_{X_s, i} \, R_{X_t, j} \, ds \, dt + \int_0^\tau \int_0^t R_{X_s, j} \, R_{X_t, i} \, ds \, dt.

Consider the first term. Conditioning on the chain trajectory, the inner integral \int_0^t R_{X_s, i} \, ds accumulates reward R_{\cdot, i} over [0, t], and R_{X_t, j} weights this by the instantaneous reward at time t. By the Markov property and the occupation time interpretation of \mathbf{U}, conditioning on starting state and applying iterated expectations gives

\mathbb{E}\left[\int_0^\tau \int_0^t R_{X_s, i} \, R_{X_t, j} \, ds \, dt\right] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{e}.

This can be verified by expanding \mathbf{U} as the occupation time matrix. The entry (\mathbf{U})_{kl} is the expected time in state l starting from state k. The product \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) applies reward weighting, and the second \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) accounts for the second accumulation period. By symmetry of the argument under exchange of i and j, the second triangle contributes the second term. \square

Theorem 2.5 (Graph-matrix correspondence) Let G = (V, E, W) be a phase-type graph with starting vertex v_0 and transient vertices v_1, \ldots, v_p as in Definition 2.13. Define (\boldsymbol{\alpha}, \mathbf{S}) as in Definition 2.14. Then G encodes exactly the distribution \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}), and the correspondence is a bijection between (i) phase-type graphs with p transient vertices (up to relabeling of non-starting, non-absorbing vertices) and (ii) phase-type representations \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}).

Proof. We verify both directions of the correspondence.

(Graph to matrix.) Given G, construct \boldsymbol{\alpha} and \mathbf{S} as in Definition 2.14. We verify \mathbf{S} is a valid sub-intensity matrix: - Off-diagonal: s_{ij} = w(v_i \to v_j) \geq 0 for i \neq j (edge weights in phase-type graphs are non-negative). - Diagonal: s_{ii} = -\lambda_{v_i} < 0 since \lambda_{v_i} > 0 for transient vertices (they must have at least one outgoing edge, because otherwise they would be absorbing). - Row sums: \sum_j s_{ij} = \sum_{j \neq i} w(v_i \to v_j) - \lambda_{v_i} = \sum_{j \neq i} w(v_i \to v_j) - \sum_{z \in \operatorname{children}(v_i)} w(v_i \to z) \leq 0. The inequality is because \operatorname{children}(v_i) may include absorbing vertices, so \lambda_{v_i} \geq \sum_{j \neq i} s_{ij}.

The initial distribution satisfies \alpha_i = w(v_0 \to v_i) \geq 0 and \sum_i \alpha_i = \lambda_{v_0} \leq 1.

(Matrix to graph.) Given \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}), construct G: create v_0 with edges w(v_0 \to v_i) = \alpha_i for each i with \alpha_i > 0. Create v_1, \ldots, v_p with edges w(v_i \to v_j) = s_{ij} for each i \neq j with s_{ij} > 0. If s_i = -\sum_j s_{ij} > 0 (i.e., state i has positive exit rate), create an edge from v_i to an absorbing vertex with weight s_i. The resulting graph satisfies the conditions of Definition 2.13.

(Bijection.) The two constructions are inverses of each other up to relabeling: different orderings of the transient vertices yield the same (\boldsymbol{\alpha}, \mathbf{S}) up to simultaneous permutation of rows, columns of \mathbf{S} and entries of \boldsymbol{\alpha}; different absorbing vertex structures (one shared absorbing vertex vs. multiple) yield the same \mathbf{s}. \square

Implementation Notes

Source code mapping:

Concept	File	Function/Struct	Notes
Graph structure	`api/c/phasic.h`	`struct ptd_graph`, `struct ptd_vertex`, `struct ptd_edge`	Vertices stored in AVL tree; edges as linked lists
Starting vertex	`api/c/phasic.h`	`ptd_graph.root`	The starting vertex v_0
Edge weights	`api/c/phasic.h`	`ptd_edge.weight`	Single `double` per edge
Vertex scalar	`api/c/phasic.h`	`ptd_vertex.state`	Stores x_v; set during normalization
PDF computation	`src/c/phasic.c`	`ptd_graph_pdf`	Implements Equation 2.1 via uniformization (documented separately)
PMF computation	`src/c/phasic.c`	`ptd_dph_pmf`	Implements Equation 2.6 via forward algorithm (documented separately)
Defect	`src/c/phasic.c`	`ptd_defect`	Returns \alpha_0
Parameterized edges	`src/cpp/parameterized/graph_builder.cpp`	`GraphBuilder`	Coefficient vectors and weight modes

Deviations from formalization:

The notation standard defines the reward matrix \mathbf{R} as p \times m (columns are marginals), consistent with Bladt and Nielsen (2017). The Python API (as of v0.22.22) uses the transposed convention with shape (m, p) (rows are marginals) for more intuitive indexing. The mathematical formulas in this file follow the p \times m convention.

Remark (Suspected code issue: ptd_defect). The function ptd_defect in src/c/phasic.c (line 1295) is a stub that unconditionally returns 0.0, regardless of the actual initial distribution. This means the defect \alpha_0 = 1 - \boldsymbol{\alpha}\mathbf{e} is never correctly computed by this function. Code that relies on ptd_defect will silently treat all distributions as defect-free (\alpha_0 = 0). This does not affect PDF/PMF computation (which uses \boldsymbol{\alpha} directly) but would affect any code path that needs to handle point masses at the origin.

Symbol Index

Symbol	Name	First appearance
\alpha_0	Defect	Definition 2.2
\boldsymbol{\alpha}	Initial distribution vector	Definition 2.2
\bar{F}_\tau(t)	Survival function	Definition 2.3
\mathbf{c}	Coefficient vector	Definition 2.15
\operatorname{children}(v)	Children of vertex v	Definition 2.12
E	Set of edges	Definition 2.12
\mathbf{e}	Column vector of ones	Definition 2.2
f_\tau(t)	Probability density function	Definition 2.3
F_\tau(t)	Cumulative distribution function	Definition 2.3
G = (V, E)	Directed graph	Definition 2.12
\mathbf{I}	Identity matrix	Definition 2.5
\boldsymbol{\Lambda}	Full intensity matrix	Definition 2.1
\lambda_v	Total outgoing rate of vertex v	Definition 2.12
\mathcal{L}_\tau(u)	Laplace transform	Definition 2.5
p	Number of transient states	Definition 2.2
\operatorname{parents}(v)	Parents of vertex v	Definition 2.12
\phi	Callback weight function	Definition 2.15
\mathbf{r}	Reward vector	Definition 2.9
r_v	Vertex reward	Definition 2.13
\mathbf{R}	Reward matrix	Definition 2.10
\mathbf{s}	Exit rate vector	Definition 2.2
\mathbf{S}	Sub-intensity matrix	Definition 2.2
\mathbf{t}	Exit probability vector	Definition 2.7
\mathbf{T}	Sub-transition matrix	Definition 2.7
\tau	Absorption time	Definition 2.2
\tilde{\tau}	Reward-transformed absorption time	Definition 2.9
\boldsymbol{\theta}	Parameter vector	Definition 2.15
\triangle(\mathbf{r})	Diagonal reward matrix	Definition 2.9
\mathbf{U}	Green matrix	Definition 2.4
V	Set of vertices	Definition 2.12
v_0	Starting vertex	Definition 2.13
W	Weight function	Definition 2.12
w(v \to z)	Edge weight	Definition 2.12
w'(v \to z)	Normalized edge weight	Definition 2.16
x_v	Vertex scalar	Definition 2.16
\{X_t\}_{t \geq 0}	Continuous-time Markov jump process	Definition 2.1
\{X_t\}_{t \in \mathbb{N}_0}	Discrete-time Markov chain	Definition 2.6
\mathbf{Y}	MPH distributed random vector	Definition 2.10