2 Preliminaries
Introduction
This file establishes the foundational mathematical definitions and results that every subsequent SOM file depends upon. It defines continuous and discrete phase-type distributions, their moments and transforms, reward-transformed and multivariate extensions, and the weighted directed graph representation used throughout phasic. No algorithms or pseudocode appear here; those begin in later files (graph elimination, reward transformation, and beyond).
The definitions follow Bladt and Nielsen (2017) for the matrix-analytic foundations, Hobolth et al. (2024) for multivariate cross-moments, and Røikjer et al. (2022) for the graph representation. All notation conforms to docs/notation_standard.md.
Prerequisites: None.
Source files: - api/c/phasic.h (struct definitions: ptd_graph, ptd_vertex, ptd_edge) - src/c/phasic.c (functions: ptd_defect, ptd_graph_pdf, ptd_dph_pmf)
Definitions
Continuous-time foundations
Phase-type distributions arise as absorption times in Markov processes on finite state spaces. We begin by defining the underlying stochastic process, then derive the distribution of its absorption time.
Definition 2.1 (Continuous-time Markov jump process) A continuous-time Markov jump process on the finite state space \{1, 2, \ldots, p, p+1\} is a stochastic process \{X_t\}_{t \geq 0} characterized by a (p+1) \times (p+1) intensity matrix \boldsymbol{\Lambda} and an initial distribution. The process satisfies: (i) given X_t = i, the holding time in state i is exponentially distributed with rate -\Lambda_{ii}; (ii) upon leaving state i, the process jumps to state j \neq i with probability \Lambda_{ij} / (-\Lambda_{ii}); (iii) the process is time-homogeneous and satisfies the Markov property.
Intuition
The process lives in one of finitely many states. It waits an exponential time in each state, then jumps to another state with probabilities determined by the intensity matrix. The special state p+1 will serve as the absorbing state.Example
A two-state process with states \{1, 2\} where state 2 is absorbing. The intensity matrix is \boldsymbol{\Lambda} = \begin{pmatrix} -\lambda & \lambda \\ 0 & 0 \end{pmatrix}. Starting in state 1, the process waits an \operatorname{Exp}(\lambda) time and then enters state 2 permanently.Definition 2.2 (Continuous phase-type distribution) Let \{X_t\}_{t \geq 0} be a continuous-time Markov jump process on \{1, 2, \ldots, p, p+1\}, where states 1, \ldots, p are transient and state p+1 is absorbing. Let \boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_p) be the row vector of initial probabilities on the transient states, so that \alpha_i = \Pr(X_0 = i) for i = 1, \ldots, p. Define the defect \alpha_0 = 1 - \sum_{i=1}^{p} \alpha_i \geq 0, representing the probability of starting in the absorbing state. Let \mathbf{S} be the p \times p sub-intensity matrix governing transitions among transient states: off-diagonal entries s_{ij} \geq 0 for i \neq j, diagonal entries s_{ii} < 0, and row sums non-positive. Define the exit rate vector \mathbf{s} = -\mathbf{S}\mathbf{e}, where \mathbf{e} is the column vector of ones. The absorption time \tau = \inf\{t \geq 0 : X_t = p+1\} is said to follow a continuous phase-type distribution of order p, written \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) (Neuts 1981).
Intuition
The sub-intensity matrix \mathbf{S} captures all dynamics among the transient states. The exit rate vector \mathbf{s} captures absorption: the i-th entry s_i = -\sum_{j=1}^{p} s_{ij} \geq 0 is the rate at which state i transitions to the absorbing state. When \alpha_0 > 0, the distribution has a point mass at \tau = 0.Example
An Erlang distribution \operatorname{Erlang}(k, \lambda) is \operatorname{PH}_k(\boldsymbol{\alpha}, \mathbf{S}) with \boldsymbol{\alpha} = (1, 0, \ldots, 0) and \mathbf{S} a k \times k bi-diagonal matrix with s_{ii} = -\lambda and s_{i,i+1} = \lambda for i = 1, \ldots, k-1.Definition 2.3 (PDF, CDF, and survival function of CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}). The probability density function, cumulative distribution function, and survival function are
f_\tau(t) = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{s}, \quad t \geq 0, \tag{2.1}
F_\tau(t) = 1 - \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{e}, \quad t \geq 0, \tag{2.2}
\bar{F}_\tau(t) = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{e}, \quad t \geq 0. \tag{2.3}
When \alpha_0 > 0, the distribution has a point mass \Pr(\tau = 0) = \alpha_0 at the origin (Neuts 1981; Bladt and Nielsen 2017).
Intuition
The matrix exponential e^{\mathbf{S}t} propagates the initial distribution forward in time over the transient states. The entry (e^{\mathbf{S}t})_{ij} gives the probability of being in transient state j at time t given start in state i (conditional on not yet being absorbed). Multiplying by \mathbf{s} extracts the instantaneous absorption rate.Definition 2.4 (Moments of CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let \mathbf{U} = (-\mathbf{S})^{-1} be the Green matrix. The n-th moment of \tau is
\mathbb{E}[\tau^n] = n! \, \boldsymbol{\alpha} (-\mathbf{S})^{-n} \mathbf{e} = n! \, \boldsymbol{\alpha} \mathbf{U}^n \mathbf{e}. \tag{2.4}
Intuition
The Green matrix entry U_{ij} equals \mathbb{E}[Z_j \mid X_0 = i], the expected total time spent in transient state j before absorption, given start in state i. The first moment \mathbb{E}[\tau] = \boldsymbol{\alpha} \mathbf{U} \mathbf{e} is the expected total time across all transient states, weighted by the initial distribution.Example
For \operatorname{Exp}(\lambda): \mathbf{S} = (-\lambda), \mathbf{U} = (1/\lambda), \boldsymbol{\alpha} = (1). Then \mathbb{E}[\tau] = 1 \cdot (1/\lambda) \cdot 1 = 1/\lambda and \mathbb{E}[\tau^2] = 2/\lambda^2.Definition 2.5 (Laplace transform of CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}). The Laplace transform of \tau is
\mathcal{L}_\tau(u) = \mathbb{E}[e^{-u\tau}] = \boldsymbol{\alpha}(u\mathbf{I} - \mathbf{S})^{-1} \mathbf{s}, \quad u \geq 0. \tag{2.5}
Intuition
The Laplace transform encodes the entire distribution in the frequency domain. It is the generating function for the moments: (-1)^n \frac{d^n}{du^n} \mathcal{L}_\tau(u) \big|_{u=0} = \mathbb{E}[\tau^n]. The matrix (u\mathbf{I} - \mathbf{S})^{-1} is the resolvent of \mathbf{S}.Discrete-time foundations
The discrete-time analogue replaces exponential holding times with geometric holding times and continuous time with a discrete step counter.
Definition 2.6 (Discrete-time Markov chain on finite state space) A discrete-time Markov chain on the finite state space \{1, 2, \ldots, p, p+1\} is a stochastic process \{X_t\}_{t \in \mathbb{N}_0} satisfying the Markov property, characterized by a (p+1) \times (p+1) transition matrix and an initial distribution.
Intuition
At each discrete time step, the chain transitions from its current state to a new state (possibly the same state) according to fixed transition probabilities.Definition 2.7 (Discrete phase-type distribution) Let \{X_t\}_{t \in \mathbb{N}_0} be a discrete-time Markov chain on \{1, 2, \ldots, p, p+1\} with transient states 1, \ldots, p and absorbing state p+1. Let \boldsymbol{\alpha} = (\alpha_1, \ldots, \alpha_p) be the initial distribution on transient states, let \mathbf{T} be the p \times p sub-transition matrix with entries t_{ij} \geq 0 and row sums at most 1, and let \mathbf{t} = \mathbf{e} - \mathbf{T}\mathbf{e} be the exit probability vector. The number of steps until absorption \tau = \inf\{t \geq 1 : X_t = p+1\} follows a discrete phase-type distribution, written \tau \sim \operatorname{DPH}_p(\boldsymbol{\alpha}, \mathbf{T}) (Neuts 1981).
Intuition
The sub-transition matrix \mathbf{T} plays the role of \mathbf{S} in the continuous case, but entries are probabilities rather than rates. The exit probability t_i = 1 - \sum_j T_{ij} is the probability of absorbing from state i at the next step.Example
A geometric distribution \operatorname{Geo}(1-q) (number of trials until the first success) is \operatorname{DPH}_1((1), (q)) with \mathbf{T} = (q) and \mathbf{t} = (1-q).Definition 2.8 (PMF and moments of DPH) Let \tau \sim \operatorname{DPH}_p(\boldsymbol{\alpha}, \mathbf{T}). The probability mass function is
\Pr(\tau = k) = \boldsymbol{\alpha} \mathbf{T}^{k-1} \mathbf{t}, \quad k = 1, 2, \ldots \tag{2.6}
The first moment is
\mathbb{E}[\tau] = \boldsymbol{\alpha}(\mathbf{I} - \mathbf{T})^{-1}\mathbf{e}. \tag{2.7}
More generally, the n-th factorial moment is \mathbb{E}[\tau(\tau-1)\cdots(\tau-n+1)] = n! \, \boldsymbol{\alpha} (\mathbf{I} - \mathbf{T})^{-n} \mathbf{T}^{n-1} \mathbf{e}.
Intuition
The PMF follows the same logic as the continuous PDF: \mathbf{T}^{k-1} propagates the initial distribution through k-1 steps among transient states, then \mathbf{t} extracts the absorption probability. The matrix (\mathbf{I} - \mathbf{T})^{-1} is the discrete analogue of the Green matrix \mathbf{U}.Reward transformations
Reward-transformed phase-type distributions generalize the basic PH family by assigning non-negative rewards to transient states and measuring accumulated reward rather than raw time.
Definition 2.9 (Reward-transformed CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let \mathbf{r} = (r_1, \ldots, r_p)^\top be a reward vector with r_i \geq 0 for all i. Let \triangle(\mathbf{r}) denote the p \times p diagonal matrix with \mathbf{r} on the diagonal. The reward-transformed random variable is
\tilde{\tau} = \int_0^{\tau} r(X_t) \, dt, \tag{2.8}
where r(X_t) = r_i when X_t = i. If all rewards are strictly positive (r_i > 0 for all i), then \tilde{\tau} is itself phase-type distributed (Bladt and Nielsen 2017):
\tilde{\tau} \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \triangle(\mathbf{r})^{-1}\mathbf{S}). \tag{2.9}
The moments of the reward-transformed distribution are
\mathbb{E}[\tilde{\tau}^n] = n! \, \boldsymbol{\alpha} \left( (-\mathbf{S})^{-1} \triangle(\mathbf{r}) \right)^n \mathbf{e} = n! \, \boldsymbol{\alpha} \left( \mathbf{U} \triangle(\mathbf{r}) \right)^n \mathbf{e}. \tag{2.10}
Intuition
A reward of r_i assigned to state i means that while the process occupies state i, reward accumulates at rate r_i per unit time. The total accumulated reward until absorption generalizes the absorption time (which corresponds to all rewards equal to 1). The closure property (Equation 2.9) is fundamental: the class of phase-type distributions is closed under positive reward transformation. The transformed sub-intensity matrix \triangle(\mathbf{r})^{-1}\mathbf{S} rescales the rates in each state by the inverse of the reward.Example
Let \tau \sim \operatorname{Exp}(\lambda) (a single transient state with rate \lambda) and r_1 = c > 0. Then \tilde{\tau} = c \cdot \tau \sim \operatorname{Exp}(\lambda/c), which is \operatorname{PH}_1((1), (-\lambda/c)). This is the standard scaling property of the exponential distribution.Multivariate phase-type distributions
When multiple reward vectors are applied simultaneously, the resulting collection of accumulated rewards forms a multivariate phase-type distribution.
Definition 2.10 (Multivariate phase-type distribution) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let \mathbf{R} be a p \times m reward matrix whose column j, denoted \mathbf{R}_{\cdot j}, is a reward vector for marginal j. The random vector \mathbf{Y} = (Y_1, \ldots, Y_m) defined by
Y_j = \int_0^{\tau} R_{X_t, j} \, dt, \quad j = 1, \ldots, m, \tag{2.11}
follows a multivariate phase-type distribution, written \mathbf{Y} \sim \operatorname{MPH}(\boldsymbol{\alpha}, \mathbf{S}, \mathbf{R}) (Kulkarni 1989; Bladt and Nielsen 2017).
Intuition
Each marginal Y_j is a reward-transformed variable corresponding to column j of \mathbf{R}. The marginals share the same underlying Markov chain and hence are dependent. The joint distribution captures the correlation structure induced by the shared chain.Example
In population genetics, one might track both the total branch length (r_i = 1 for all i) and the branch length subtending a particular clade (r_i = 1 for states where the clade is present, r_i = 0 otherwise). The two accumulated quantities form a bivariate phase-type distribution.Definition 2.11 (Cross-moments of MPH) Let \mathbf{Y} \sim \operatorname{MPH}(\boldsymbol{\alpha}, \mathbf{S}, \mathbf{R}) with Green matrix \mathbf{U} = (-\mathbf{S})^{-1}. The second-order cross-moment between marginals i and j is
\mathbb{E}[Y_i Y_j] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{e} + \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{e}. \tag{2.12}
Intuition
The cross-moment decomposes into two terms corresponding to the two orderings of reward accumulation: first accumulating \mathbf{R}_{\cdot i} then \mathbf{R}_{\cdot j}, and vice versa. This symmetry arises from a first-step analysis of the occupation measures. The formula reduces to \mathbb{E}[Y_i^2] = 2 \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{e} when i = j, consistent with the second moment formula (Equation 2.10 with n = 2).Graph representation
The graph representation is the central data structure of phasic. It encodes phase-type distributions in a form that enables graph-based algorithms to operate in O(n + m) space rather than the O(n^2) required by dense matrices.
Definition 2.12 (Weighted directed graph) A weighted directed graph is a triple G = (V, E, W) where V is a finite set of vertices, E \subseteq V \times V is a set of directed edges, and W \colon E \to \mathbb{R} is a weight function. We write (v \to z) \in E for a directed edge from v to z and w(v \to z) for its weight. The children and parents of v are \operatorname{children}(v) = \{z \in V : (v \to z) \in E\} and \operatorname{parents}(v) = \{u \in V : (u \to v) \in E\}. The total outgoing rate of v is \lambda_v = \sum_{z \in \operatorname{children}(v)} w(v \to z).
Intuition
The graph is a sparse representation of the generator (for CPH) or transition matrix (for DPH). Only edges with non-zero weight are stored, which is why the representation is efficient for sparse Markov chains.Definition 2.13 (Phase-type graph structure) A phase-type graph is a weighted directed graph G = (V, E, W) with the following distinguished structure:
- A starting vertex v_0 \in V whose outgoing edges encode the initial distribution: w(v_0 \to v_i) = \alpha_i for each transient state i with \alpha_i > 0, and \lambda_{v_0} \leq 1.
- Transient vertices v_1, \ldots, v_p \in V, one per transient state. The edge weights between transient vertices encode off-diagonal entries of the sub-intensity matrix (CPH) or sub-transition matrix (DPH).
- Absorbing vertices: vertices v \in V with \operatorname{children}(v) = \emptyset.
The starting vertex v_0 is assigned reward zero. Each transient vertex v_i may be assigned a non-negative reward r_{v_i} \geq 0.
Intuition
The starting vertex v_0 acts as a “dispatcher” that distributes probability mass to the transient states according to \boldsymbol{\alpha}. It is not itself a transient state of the Markov chain. Absorbing vertices are sinks with no outgoing edges; once probability mass reaches them, it is permanently captured.Example
An \operatorname{Exp}(\lambda) distribution is represented by three vertices: v_0 (starting), v_1 (transient), and v_a (absorbing), with edges w(v_0 \to v_1) = 1 and w(v_1 \to v_a) = \lambda.Definition 2.14 (Graph-matrix correspondence) Given a phase-type graph G with starting vertex v_0 and transient vertices v_1, \ldots, v_p, define \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) by:
- \alpha_i = w(v_0 \to v_i) for i = 1, \ldots, p (zero if no such edge exists);
- s_{ij} = w(v_i \to v_j) for i \neq j (zero if no such edge exists);
- s_{ii} = -\lambda_{v_i} (negative of the total outgoing rate from vertex v_i).
Conversely, given \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}), construct G by creating v_0, vertices v_1, \ldots, v_p, absorbing vertices as needed, and edges with the weights above.
Intuition
This correspondence is the foundation of phasic: every matrix computation on (\boldsymbol{\alpha}, \mathbf{S}) has a graph counterpart on G. The diagonal entries of \mathbf{S} are not stored as self-loops; instead, they are implicitly determined by the total outgoing rate \lambda_{v_i} = \sum_{j \neq i} s_{ij} + s_i, where s_i is the exit rate to the absorbing state.Parameterized graphs
In many applications, edge weights depend on unknown parameters. Parameterized edges formalize this dependence.
Definition 2.15 (Parameterized edges) Let \boldsymbol{\theta} = (\theta_1, \ldots, \theta_k) be a parameter vector. A parameterized edge (v \to z) carries a coefficient vector \mathbf{c} = (c_1, \ldots, c_k) and computes its weight according to a weight mode:
| Mode | Weight computation |
|---|---|
| linear | w(v \to z) = \sum_{j=1}^{k} c_j \theta_j |
| log | w(v \to z) = \prod_{j=1}^{k} (c_j \theta_j) |
| callback | w(v \to z) = \phi(\boldsymbol{\theta}, \mathbf{c}) for a user-specified function \phi |
In linear mode, the weight is an affine combination of the parameters (affine because the coefficients c_j can encode constant terms). In log mode, the weight is a product computed in log-space for numerical stability: \log w = \sum_j \log(c_j \theta_j). In callback mode, the weight is the output of an arbitrary function \phi.
Intuition
Parameterized edges allow phasic to represent families of phase-type distributions indexed by \boldsymbol{\theta}. This is essential for inference: given observed data, one seeks the parameter vector \boldsymbol{\theta} that best explains the data under the model encoded by the parameterized graph. The three weight modes cover additive rate models (linear), multiplicative factor models (log), and fully general nonlinear models (callback).Definition 2.16 (Vertex scalar and normalized graph) Let G = (V, E, W) be a phase-type graph. The vertex scalar of v \in V is
x_v = \frac{1}{\lambda_v}, \tag{2.13}
where \lambda_v = \sum_{z \in \operatorname{children}(v)} w(v \to z) is the total outgoing rate. A normalized graph is a graph in which each edge weight w(v \to z) has been replaced by the transition probability w'(v \to z) = w(v \to z) / \lambda_v. After normalization, \lambda_v = 1 for all transient vertices, and x_v stores the original total outgoing rate as the expected waiting time.
Intuition
For a continuous phase-type distribution, x_v = 1/\lambda_v is the expected time spent during each visit to state v (the mean of the exponential holding time). Normalization separates the graph into two components: the embedded discrete-time Markov chain (the normalized edge weights, which are transition probabilities) and the holding time structure (the vertex scalars). This separation is exploited by graph elimination and moment computation algorithms in later files.Theorems and Proofs
Theorem 2.1 (PDF of a continuous phase-type distribution) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}). Then
f_\tau(t) = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{s}, \quad t > 0.
Proof. Let \mathbf{p}(t) = (\Pr(X_t = 1), \ldots, \Pr(X_t = p)) be the row vector of state probabilities at time t, restricted to the transient states. The Kolmogorov forward equation for the transient states gives
\frac{d}{dt} \mathbf{p}(t) = \mathbf{p}(t) \mathbf{S},
with initial condition \mathbf{p}(0) = \boldsymbol{\alpha}. The unique solution is \mathbf{p}(t) = \boldsymbol{\alpha} e^{\mathbf{S}t}.
The survival function is the probability of remaining in the transient states:
\bar{F}_\tau(t) = \Pr(\tau > t) = \mathbf{p}(t) \mathbf{e} = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{e}.
The PDF is obtained by differentiating the CDF:
f_\tau(t) = -\frac{d}{dt} \bar{F}_\tau(t) = -\boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{S} \mathbf{e} = \boldsymbol{\alpha} e^{\mathbf{S}t} (-\mathbf{S}\mathbf{e}) = \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{s},
where we used the identity \frac{d}{dt} e^{\mathbf{S}t} = e^{\mathbf{S}t} \mathbf{S} = \mathbf{S} e^{\mathbf{S}t} and the definition \mathbf{s} = -\mathbf{S}\mathbf{e}. \square
Theorem 2.2 (Moment formula for CPH) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}). Then for all n \geq 1,
\mathbb{E}[\tau^n] = n! \, \boldsymbol{\alpha} (-\mathbf{S})^{-n} \mathbf{e}.
Proof. The Laplace transform of \tau is
\mathcal{L}_\tau(u) = \mathbb{E}[e^{-u\tau}] = \int_0^\infty e^{-ut} \boldsymbol{\alpha} e^{\mathbf{S}t} \mathbf{s} \, dt = \boldsymbol{\alpha} \left( \int_0^\infty e^{(\mathbf{S} - u\mathbf{I})t} \, dt \right) \mathbf{s}.
Since all eigenvalues of \mathbf{S} have strictly negative real parts (the transient states are indeed transient), the integral converges for u \geq 0:
\int_0^\infty e^{(\mathbf{S} - u\mathbf{I})t} \, dt = -({\mathbf{S} - u\mathbf{I}})^{-1} = (u\mathbf{I} - \mathbf{S})^{-1}.
Thus \mathcal{L}_\tau(u) = \boldsymbol{\alpha}(u\mathbf{I} - \mathbf{S})^{-1}\mathbf{s}.
The n-th moment is recovered by differentiation:
\mathbb{E}[\tau^n] = (-1)^n \frac{d^n}{du^n} \mathcal{L}_\tau(u) \bigg|_{u=0}.
We compute \frac{d^n}{du^n} (u\mathbf{I} - \mathbf{S})^{-1} = (-1)^n n! \, (u\mathbf{I} - \mathbf{S})^{-(n+1)}, which follows by induction using the identity \frac{d}{du}(u\mathbf{I} - \mathbf{S})^{-1} = -(u\mathbf{I} - \mathbf{S})^{-2}. Evaluating at u = 0:
(-1)^n \cdot (-1)^n n! \, (-\mathbf{S})^{-(n+1)} \mathbf{s} = n! \, (-\mathbf{S})^{-(n+1)} \mathbf{s}.
Since \mathbf{s} = -\mathbf{S}\mathbf{e}, we obtain (-\mathbf{S})^{-(n+1)} \cdot (-\mathbf{S}) \cdot \mathbf{e} \cdot (-1) = (-\mathbf{S})^{-(n+1)} \cdot (-\mathbf{S}) \cdot \mathbf{e} \cdot (-1). More carefully: \mathbf{s} = -\mathbf{S}\mathbf{e} = (-\mathbf{S})\mathbf{e}, so
n! \, \boldsymbol{\alpha} (-\mathbf{S})^{-(n+1)} (-\mathbf{S}) \mathbf{e} = n! \, \boldsymbol{\alpha} (-\mathbf{S})^{-n} \mathbf{e}. \quad \square
Theorem 2.3 (Closure under reward transformation) Let \tau \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let \mathbf{r} = (r_1, \ldots, r_p)^\top with r_i > 0 for all i. Then the reward-transformed variable \tilde{\tau} = \int_0^{\tau} r(X_t) \, dt satisfies
\tilde{\tau} \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \triangle(\mathbf{r})^{-1}\mathbf{S}).
Proof. Define a time change by \tilde{t}(t) = \int_0^t r(X_s) \, ds. Since r_i > 0 for all i, this is strictly increasing almost surely, and hence invertible. Let t(\tilde{t}) denote the inverse. Define the time-changed process \tilde{X}_{\tilde{t}} = X_{t(\tilde{t})}.
We verify that \tilde{X} is a Markov jump process with sub-intensity matrix \triangle(\mathbf{r})^{-1}\mathbf{S}. When \tilde{X}_{\tilde{t}} = i, the original process X is in state i, which has holding rate -s_{ii} (i.e., X leaves state i at rate -s_{ii}). In the time-changed clock, time runs at rate r_i relative to the original clock, so the holding rate in the new clock is -s_{ii}/r_i. Similarly, the transition rate from i to j in the new clock is s_{ij}/r_i. The sub-intensity matrix of the time-changed process is therefore \triangle(\mathbf{r})^{-1}\mathbf{S}.
The initial distribution is unchanged because the time change does not affect the initial state. The absorption time in the new clock is \tilde{\tau} = \tilde{t}(\tau) = \int_0^\tau r(X_s) \, ds, which is the reward-transformed variable. Therefore \tilde{\tau} \sim \operatorname{PH}_p(\boldsymbol{\alpha}, \triangle(\mathbf{r})^{-1}\mathbf{S}). \square
Theorem 2.4 (Cross-moment formula for MPH) Let \mathbf{Y} \sim \operatorname{MPH}(\boldsymbol{\alpha}, \mathbf{S}, \mathbf{R}) with Green matrix \mathbf{U} = (-\mathbf{S})^{-1}. Then
\mathbb{E}[Y_i Y_j] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{e} + \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{e}.
Proof. We follow the first-step analysis approach of Hobolth et al. (2024). Write Y_i = \int_0^\tau R_{X_t, i} \, dt and Y_j = \int_0^\tau R_{X_t, j} \, dt. Then
Y_i Y_j = \int_0^\tau \int_0^\tau R_{X_s, i} \, R_{X_t, j} \, ds \, dt.
Split the double integral over the square [0, \tau]^2 into the two triangles \{s < t\} and \{s > t\} (the diagonal \{s = t\} has measure zero):
Y_i Y_j = \int_0^\tau \int_0^t R_{X_s, i} \, R_{X_t, j} \, ds \, dt + \int_0^\tau \int_0^t R_{X_s, j} \, R_{X_t, i} \, ds \, dt.
Consider the first term. Conditioning on the chain trajectory, the inner integral \int_0^t R_{X_s, i} \, ds accumulates reward R_{\cdot, i} over [0, t], and R_{X_t, j} weights this by the instantaneous reward at time t. By the Markov property and the occupation time interpretation of \mathbf{U}, conditioning on starting state and applying iterated expectations gives
\mathbb{E}\left[\int_0^\tau \int_0^t R_{X_s, i} \, R_{X_t, j} \, ds \, dt\right] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{e}.
This can be verified by expanding \mathbf{U} as the occupation time matrix. The entry (\mathbf{U})_{kl} is the expected time in state l starting from state k. The product \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) applies reward weighting, and the second \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) accounts for the second accumulation period. By symmetry of the argument under exchange of i and j, the second triangle contributes the second term. \square
Theorem 2.5 (Graph-matrix correspondence) Let G = (V, E, W) be a phase-type graph with starting vertex v_0 and transient vertices v_1, \ldots, v_p as in Definition 2.13. Define (\boldsymbol{\alpha}, \mathbf{S}) as in Definition 2.14. Then G encodes exactly the distribution \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}), and the correspondence is a bijection between (i) phase-type graphs with p transient vertices (up to relabeling of non-starting, non-absorbing vertices) and (ii) phase-type representations \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}).
Proof. We verify both directions of the correspondence.
(Graph to matrix.) Given G, construct \boldsymbol{\alpha} and \mathbf{S} as in Definition 2.14. We verify \mathbf{S} is a valid sub-intensity matrix: - Off-diagonal: s_{ij} = w(v_i \to v_j) \geq 0 for i \neq j (edge weights in phase-type graphs are non-negative). - Diagonal: s_{ii} = -\lambda_{v_i} < 0 since \lambda_{v_i} > 0 for transient vertices (they must have at least one outgoing edge, because otherwise they would be absorbing). - Row sums: \sum_j s_{ij} = \sum_{j \neq i} w(v_i \to v_j) - \lambda_{v_i} = \sum_{j \neq i} w(v_i \to v_j) - \sum_{z \in \operatorname{children}(v_i)} w(v_i \to z) \leq 0. The inequality is because \operatorname{children}(v_i) may include absorbing vertices, so \lambda_{v_i} \geq \sum_{j \neq i} s_{ij}.
The initial distribution satisfies \alpha_i = w(v_0 \to v_i) \geq 0 and \sum_i \alpha_i = \lambda_{v_0} \leq 1.
(Matrix to graph.) Given \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}), construct G: create v_0 with edges w(v_0 \to v_i) = \alpha_i for each i with \alpha_i > 0. Create v_1, \ldots, v_p with edges w(v_i \to v_j) = s_{ij} for each i \neq j with s_{ij} > 0. If s_i = -\sum_j s_{ij} > 0 (i.e., state i has positive exit rate), create an edge from v_i to an absorbing vertex with weight s_i. The resulting graph satisfies the conditions of Definition 2.13.
(Bijection.) The two constructions are inverses of each other up to relabeling: different orderings of the transient vertices yield the same (\boldsymbol{\alpha}, \mathbf{S}) up to simultaneous permutation of rows, columns of \mathbf{S} and entries of \boldsymbol{\alpha}; different absorbing vertex structures (one shared absorbing vertex vs. multiple) yield the same \mathbf{s}. \square
Implementation Notes
Source code mapping:
| Concept | File | Function/Struct | Notes |
|---|---|---|---|
| Graph structure | api/c/phasic.h |
struct ptd_graph, struct ptd_vertex, struct ptd_edge |
Vertices stored in AVL tree; edges as linked lists |
| Starting vertex | api/c/phasic.h |
ptd_graph.root |
The starting vertex v_0 |
| Edge weights | api/c/phasic.h |
ptd_edge.weight |
Single double per edge |
| Vertex scalar | api/c/phasic.h |
ptd_vertex.state |
Stores x_v; set during normalization |
| PDF computation | src/c/phasic.c |
ptd_graph_pdf |
Implements Equation 2.1 via uniformization (documented separately) |
| PMF computation | src/c/phasic.c |
ptd_dph_pmf |
Implements Equation 2.6 via forward algorithm (documented separately) |
| Defect | src/c/phasic.c |
ptd_defect |
Returns \alpha_0 |
| Parameterized edges | src/cpp/parameterized/graph_builder.cpp |
GraphBuilder |
Coefficient vectors and weight modes |
Deviations from formalization:
- The notation standard defines the reward matrix \mathbf{R} as p \times m (columns are marginals), consistent with Bladt and Nielsen (2017). The Python API (as of v0.22.22) uses the transposed convention with shape (m, p) (rows are marginals) for more intuitive indexing. The mathematical formulas in this file follow the p \times m convention.
Remark (Suspected code issue: ptd_defect). The function ptd_defect in src/c/phasic.c (line 1295) is a stub that unconditionally returns 0.0, regardless of the actual initial distribution. This means the defect \alpha_0 = 1 - \boldsymbol{\alpha}\mathbf{e} is never correctly computed by this function. Code that relies on ptd_defect will silently treat all distributions as defect-free (\alpha_0 = 0). This does not affect PDF/PMF computation (which uses \boldsymbol{\alpha} directly) but would affect any code path that needs to handle point masses at the origin.
Symbol Index
| Symbol | Name | First appearance |
|---|---|---|
| \alpha_0 | Defect | Definition 2.2 |
| \boldsymbol{\alpha} | Initial distribution vector | Definition 2.2 |
| \bar{F}_\tau(t) | Survival function | Definition 2.3 |
| \mathbf{c} | Coefficient vector | Definition 2.15 |
| \operatorname{children}(v) | Children of vertex v | Definition 2.12 |
| E | Set of edges | Definition 2.12 |
| \mathbf{e} | Column vector of ones | Definition 2.2 |
| f_\tau(t) | Probability density function | Definition 2.3 |
| F_\tau(t) | Cumulative distribution function | Definition 2.3 |
| G = (V, E) | Directed graph | Definition 2.12 |
| \mathbf{I} | Identity matrix | Definition 2.5 |
| \boldsymbol{\Lambda} | Full intensity matrix | Definition 2.1 |
| \lambda_v | Total outgoing rate of vertex v | Definition 2.12 |
| \mathcal{L}_\tau(u) | Laplace transform | Definition 2.5 |
| p | Number of transient states | Definition 2.2 |
| \operatorname{parents}(v) | Parents of vertex v | Definition 2.12 |
| \phi | Callback weight function | Definition 2.15 |
| \mathbf{r} | Reward vector | Definition 2.9 |
| r_v | Vertex reward | Definition 2.13 |
| \mathbf{R} | Reward matrix | Definition 2.10 |
| \mathbf{s} | Exit rate vector | Definition 2.2 |
| \mathbf{S} | Sub-intensity matrix | Definition 2.2 |
| \mathbf{t} | Exit probability vector | Definition 2.7 |
| \mathbf{T} | Sub-transition matrix | Definition 2.7 |
| \tau | Absorption time | Definition 2.2 |
| \tilde{\tau} | Reward-transformed absorption time | Definition 2.9 |
| \boldsymbol{\theta} | Parameter vector | Definition 2.15 |
| \triangle(\mathbf{r}) | Diagonal reward matrix | Definition 2.9 |
| \mathbf{U} | Green matrix | Definition 2.4 |
| V | Set of vertices | Definition 2.12 |
| v_0 | Starting vertex | Definition 2.13 |
| W | Weight function | Definition 2.12 |
| w(v \to z) | Edge weight | Definition 2.12 |
| w'(v \to z) | Normalized edge weight | Definition 2.16 |
| x_v | Vertex scalar | Definition 2.16 |
| \{X_t\}_{t \geq 0} | Continuous-time Markov jump process | Definition 2.1 |
| \{X_t\}_{t \in \mathbb{N}_0} | Discrete-time Markov chain | Definition 2.6 |
| \mathbf{Y} | MPH distributed random vector | Definition 2.10 |