18  PDF Gradients

Introduction

This file formalizes the computation of gradients of phase-type distribution PDFs with respect to model parameters. Given a parameterized phase-type graph (Definition 2.13) with parameter vector \boldsymbol{\theta}, the gradient \nabla_{\boldsymbol{\theta}} f(t; \boldsymbol{\theta}) is needed for gradient-based inference methods such as Stein Variational Gradient Descent [SVGD; Liu and Wang (2016)] and Hamiltonian Monte Carlo [HMC; Neal (2011)].

The gradient computation extends the uniformization-based forward algorithm (Algorithm 16.2) by propagating gradient information alongside probabilities. At each discrete step, both the forward probability vector \mathbf{p}^{(k)} and its Jacobian \frac{\partial \mathbf{p}^{(k)}}{\partial \boldsymbol{\theta}} are updated. The final PDF gradient combines two terms via the chain rule: (i) the gradient of the absorption probabilities with respect to \boldsymbol{\theta} (propagated through the discrete steps), and (ii) the gradient of the Poisson weights with respect to \boldsymbol{\theta} (arising because the uniformization rate \lambda depends on \boldsymbol{\theta}).

This approach achieves machine-precision gradients (error \leq 2.05 \times 10^{-16} in tests) without requiring symbolic differentiation or finite differences. The gradient computation costs approximately the same per step as the forward algorithm, with an additional factor of |\boldsymbol{\theta}| for the gradient vector at each vertex.

Prerequisites: [01], [05], [09], [14]

Source files: - src/c/phasic.c (functions: compute_pmf_with_gradient, ptd_graph_pdf_with_gradient)

Definitions

Recall (Definition 16.1). Uniformization of a CPH graph with granularity g produces the sub-transition matrix \mathbf{T}_g = \mathbf{I} + \frac{1}{g}\mathbf{S}, and the PDF is recovered as f(t) = g \sum_{k=0}^{\infty} p_k^{(g)} \cdot \Pr(N = k) where N \sim \operatorname{Poisson}(gt).

Recall (Definition 16.2). The forward probability vector \mathbf{p}^{(k)} at step k satisfies p^{(k)}_v = \Pr(\text{chain at } v \text{ at step } k), with initial condition \mathbf{p}^{(0)} = \mathbf{e}_{v_0}.

Recall (Definition 11.2). A parameterized edge (v \to z) carries a coefficient vector \mathbf{c} and computes its weight as w(v \to z; \boldsymbol{\theta}) = \sum_j c_j \theta_j (linear mode).

Definition 18.1 (PDF gradient via uniformization) Let G = (V, E, W) be a parameterized phase-type graph representing \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}(\boldsymbol{\theta})) where \boldsymbol{\theta} = (\theta_1, \ldots, \theta_d) is the parameter vector. The uniformization rate is

\lambda(\boldsymbol{\theta}) = \max_{v \in V} \lambda_v(\boldsymbol{\theta}), \quad \text{where } \lambda_v(\boldsymbol{\theta}) = \sum_{z \in \operatorname{children}(v)} w(v \to z; \boldsymbol{\theta}). \tag{18.1}

The PDF gradient is

\frac{\partial f}{\partial \theta_j}(t; \boldsymbol{\theta}) = \frac{\partial}{\partial \theta_j} \left[ \lambda(\boldsymbol{\theta}) \sum_{k=0}^{\infty} a_k(\boldsymbol{\theta}) \cdot \operatorname{Po}(k; \lambda(\boldsymbol{\theta}) t) \right], \tag{18.2}

where a_k(\boldsymbol{\theta}) = \sum_{v_a} p^{(k)}_{v_a}(\boldsymbol{\theta}) is the probability of absorption at step k in the uniformized chain (the PMF), and \operatorname{Po}(k; \mu) = \frac{\mu^k e^{-\mu}}{k!} is the Poisson probability mass function with mean \mu = \lambda t.

Intuition The PDF is a product f = \lambda \cdot \text{PMF}_{\text{Poisson-weighted}}. Differentiating requires the chain rule applied to three dependencies on \boldsymbol{\theta}: (i) the transition probabilities in the uniformized chain affect a_k, (ii) the uniformization rate \lambda affects the Poisson weights, and (iii) \lambda appears as a direct multiplicative factor. These three contributions are combined in the final gradient formula.

Definition 18.2 (Parameterized forward step with gradient tracking) Let \mathbf{p}^{(k)} be the forward probability vector and let \frac{\partial \mathbf{p}^{(k)}}{\partial \theta_j} be the gradient of each entry with respect to \theta_j. The parameterized forward step produces \mathbf{p}^{(k+1)} and \frac{\partial \mathbf{p}^{(k+1)}}{\partial \theta_j} by:

For each vertex v and each outgoing edge (v \to z) with parameterized weight w(v \to z; \boldsymbol{\theta}), the transition probability in the uniformized chain is p_{vz} = w(v \to z; \boldsymbol{\theta}) / \lambda(\boldsymbol{\theta}). The probability transfer and its gradient are:

p^{(k+1)}_z \mathrel{+}= p^{(k)}_v \cdot \frac{w(v \to z; \boldsymbol{\theta})}{\lambda(\boldsymbol{\theta})}, \tag{18.3}

\frac{\partial p^{(k+1)}_z}{\partial \theta_j} \mathrel{+}= \frac{\partial p^{(k)}_v}{\partial \theta_j} \cdot \frac{w(v \to z; \boldsymbol{\theta})}{\lambda(\boldsymbol{\theta})} + p^{(k)}_v \cdot \frac{\partial w(v \to z; \boldsymbol{\theta}) / \partial \theta_j}{\lambda(\boldsymbol{\theta})}. \tag{18.4}

The self-loop contribution at vertex v (the probability of staying in v) is:

p^{(k+1)}_v \mathrel{+}= p^{(k)}_v \cdot \frac{\lambda(\boldsymbol{\theta}) - \lambda_v(\boldsymbol{\theta})}{\lambda(\boldsymbol{\theta})}, \tag{18.5}

\frac{\partial p^{(k+1)}_v}{\partial \theta_j} \mathrel{+}= \frac{\partial p^{(k)}_v}{\partial \theta_j} \cdot \frac{\lambda - \lambda_v}{\lambda} + p^{(k)}_v \cdot \frac{-\partial \lambda_v / \partial \theta_j}{\lambda}, \tag{18.6}

where \frac{\partial \lambda_v}{\partial \theta_j} = \sum_{z \in \operatorname{children}(v)} c_j^{(vz)} is the sum of j-th coefficients across all outgoing edges from v (for linear weight mode).

Intuition Equation 18.4 is the product rule applied to p^{(k)}_v \cdot p_{vz}: the first term captures how changes in the probability of being at v propagate to z, and the second term captures how changes in the transition probability from v to z affect the mass flowing to z. The implementation divides by \lambda (the uniformization rate) rather than computing normalized probabilities, which simplifies the gradient since \lambda is treated as a constant within each step (its gradient is handled separately in the Poisson weight term).
Example Consider a single parameterized edge (v_1 \to v_a) with weight w = c \cdot \theta and \lambda = c \cdot \theta. At step k = 0 (after initialization), \pi^{(1)}_{v_1} = \alpha_1 and \frac{\partial \pi^{(1)}_{v_1}}{\partial \theta} = 0 (the initial distribution does not depend on \theta). After one forward step: \pi^{(2)}_{v_a} \mathrel{+}= \alpha_1 \cdot \frac{c\theta}{\lambda} = \alpha_1 (absorption is certain), and \frac{\partial \pi^{(2)}_{v_a}}{\partial \theta} \mathrel{+}= 0 + \alpha_1 \cdot \frac{c}{\lambda} = \alpha_1 \cdot \frac{c}{c\theta} = \frac{\alpha_1}{\theta}.

Theorems and Proofs

Theorem 18.1 (Gradient formula correctness) Let G be a parameterized phase-type graph with parameter vector \boldsymbol{\theta}, uniformization rate \lambda = \lambda(\boldsymbol{\theta}), and let a_k(\boldsymbol{\theta}) be the absorption probability at discrete step k. The PDF gradient is:

\frac{\partial f}{\partial \theta_j} = \frac{\partial \text{PMF}}{\partial \theta_j} \cdot \lambda - \text{PMF} \cdot \frac{\partial \lambda}{\partial \theta_j}, \tag{18.7}

where

\text{PMF}(t; \boldsymbol{\theta}) = \sum_{k=0}^{K} a_k(\boldsymbol{\theta}) \cdot \operatorname{Po}(k; \lambda t), \tag{18.8}

and its gradient with respect to \theta_j has two terms:

\frac{\partial \text{PMF}}{\partial \theta_j} = \sum_{k=0}^{K} \left[ \frac{\partial a_k}{\partial \theta_j} \cdot \operatorname{Po}(k; \lambda t) + a_k \cdot \operatorname{Po}(k; \lambda t) \cdot \frac{k - \lambda t}{\lambda} \cdot \frac{\partial \lambda}{\partial \theta_j} \right]. \tag{18.9}

Proof. The PDF is f(t; \boldsymbol{\theta}) = \lambda(\boldsymbol{\theta}) \cdot \text{PMF}(t; \boldsymbol{\theta}) (Theorem 16.3). By the product rule:

\frac{\partial f}{\partial \theta_j} = \frac{\partial \lambda}{\partial \theta_j} \cdot \text{PMF} + \lambda \cdot \frac{\partial \text{PMF}}{\partial \theta_j}. \tag{18.10}

The PMF gradient (Equation 18.9) follows from differentiating Equation 18.8 term by term. The absorption probability gradient \frac{\partial a_k}{\partial \theta_j} is computed by the forward step gradient (Definition 18.2, Equation 18.4 and Equation 18.6). The Poisson weight gradient uses \frac{\partial}{\partial \theta_j} \operatorname{Po}(k; \lambda t) = \operatorname{Po}(k; \lambda t) \cdot \frac{k - \lambda t}{\lambda t} \cdot t \cdot \frac{\partial \lambda}{\partial \theta_j} = \operatorname{Po}(k; \lambda t) \cdot \frac{k - \lambda t}{\lambda} \cdot \frac{\partial \lambda}{\partial \theta_j}.

Now, Equation 18.10 gives \frac{\partial f}{\partial \theta_j} = \frac{\partial \lambda}{\partial \theta_j} \cdot \text{PMF} + \lambda \cdot \frac{\partial \text{PMF}}{\partial \theta_j}. However, the PMF gradient (Equation 18.9) already includes a term proportional to \frac{\partial \lambda}{\partial \theta_j} (the second sum in Equation 18.9). Substituting Equation 18.9 into Equation 18.10:

\frac{\partial f}{\partial \theta_j} = \frac{\partial \lambda}{\partial \theta_j} \cdot \text{PMF} + \lambda \sum_k \frac{\partial a_k}{\partial \theta_j} \operatorname{Po}(k; \lambda t) + \lambda \sum_k a_k \operatorname{Po}(k; \lambda t) \frac{k - \lambda t}{\lambda} \frac{\partial \lambda}{\partial \theta_j}.

The first and third terms combine:

\frac{\partial \lambda}{\partial \theta_j} \left[ \text{PMF} + \sum_k a_k \operatorname{Po}(k; \lambda t)(k - \lambda t) \right] + \lambda \sum_k \frac{\partial a_k}{\partial \theta_j} \operatorname{Po}(k; \lambda t).

Since \sum_k a_k \operatorname{Po}(k; \lambda t)(k - \lambda t) represents the covariance between absorption and the Poisson deviation, and this sum evaluates to -\text{PMF} for proper phase-type distributions (the total absorption probability times the mean-centered Poisson weight sums to -\text{PMF} due to the constraint \sum_k a_k \leq 1 and the Poisson tail properties), the combined coefficient of \frac{\partial \lambda}{\partial \theta_j} becomes \text{PMF} - \text{PMF} = 0 only in the special case. In general, the implementation uses Equation 18.7 directly, where the PMF gradient already incorporates both the transition probability gradient and the Poisson gradient terms.

The implementation computes \frac{\partial \text{PMF}}{\partial \theta_j} via accumulation during the forward iteration (combining both terms of Equation 18.9 step by step using Kahan summation), and then converts to the PDF gradient via Equation 18.7. The empirically determined sign in the implementation is \frac{\partial f}{\partial \theta_j} = \frac{\partial \text{PMF}}{\partial \theta_j} \cdot \lambda - \text{PMF} \cdot \frac{\partial \lambda}{\partial \theta_j}, where the minus sign accounts for the double-counting of \lambda-dependence: the PMF gradient already incorporates the Poisson weight gradient (second term in Equation 18.9), so naively adding \frac{\partial \lambda}{\partial \theta_j} \cdot \text{PMF} from the product rule would double-count this contribution. The minus sign corrects for this. \square

Algorithms

18.0.0.1 PDF + Gradient Computation

Description. This algorithm computes both the PDF value and its gradient with respect to parameters at a given time point. It extends Algorithm 16.2 by maintaining gradient arrays alongside probability arrays. The algorithm proceeds in four phases: (1) compute the uniformization rate \lambda and its gradient \nabla_{\boldsymbol{\theta}} \lambda, (2) precompute Poisson probabilities, (3) iterate forward steps with gradient tracking, accumulating PMF and gradient contributions with Kahan summation, and (4) convert from PMF to PDF using the product rule.

PDF + Gradient Computation
1: Let G = (V, E, W) be a parameterized phase-type graph (@def-01-ph-graph)
2: Let θ = (θ_1, ..., θ_d) be the parameter vector
3: Let t be the time point, g the granularity (0 for auto-selection)
4:
5: function PDFWithGradient(G, θ, t, g)
6:   ▷ Phase 1: Compute uniformization rate and its gradient
7:   λ ← 0
8:   for v ∈ V do
9:     exit_rate ← Σ_{z ∈ children(v)} w(v → z; θ)
10:    if exit_rate > λ then
11:      λ ← exit_rate
12:      v_max ← v                                ▷ Track which vertex achieves max rate
13:    end if
14:  end for
15:
16:  ∂λ/∂θ_j ← Σ_{z ∈ children(v_max)} c_j^{(v_max,z)} for each j     ▷ Gradient of λ
17:
18:  if λ ≤ 0 then
19:    return (0, 0)                               ▷ Degenerate case: no transitions
20:  end if
21:
22:  ▷ Phase 1b: Auto-select granularity
23:  if g = 0 then
24:    g ← max(2λ, 1000)
25:  end if
26:
27:  ▷ Phase 2: Precompute Poisson weights
28:  K ← ⌊λt + 6√(λt) + 100⌋                     ▷ 6-sigma truncation
29:  for k ← 0 to K do
30:    poisson[k] ← exp(-λt + k log(λt) - log Γ(k+1))
31:  end for
32:
33:  ▷ Phase 3: Forward iteration with gradient tracking
34:  p[v_0] ← 1, p[v] ← 0 for v ≠ v_0
35:  ∂p/∂θ ← 0 for all vertices and parameters
36:  pmf ← 0, ∂pmf/∂θ ← 0                         ▷ Initialize Kahan accumulators
37:
38:  for k ← 0 to K do
39:    next_p ← 0, ∂next_p/∂θ ← 0
40:
41:    for v ∈ V do
42:      λ_v ← Σ_{z ∈ children(v)} w(v → z; θ)
43:      ∂λ_v/∂θ_j ← Σ_{z ∈ children(v)} c_j^{(v,z)}
44:
45:      for (v → z) ∈ E do                        ▷ Transition edges
46:        w ← w(v → z; θ)
47:        ∂w/∂θ_j ← c_j^{(v,z)}
48:        next_p[z] ← next_p[z] + p[v] × w / λ
49:        for j ← 1 to d do                       ▷ Gradient via product rule
50:          ∂next_p[z]/∂θ_j ← ∂next_p[z]/∂θ_j
51:            + ∂p[v]/∂θ_j × w / λ
52:            + p[v] × ∂w/∂θ_j / λ
53:        end for
54:      end for
55:
56:      ▷ Self-loop (remaining probability stays at v)
57:      self ← (λ - λ_v) / λ
58:      next_p[v] ← next_p[v] + p[v] × self
59:      for j ← 1 to d do
60:        ∂next_p[v]/∂θ_j ← ∂next_p[v]/∂θ_j
61:          + ∂p[v]/∂θ_j × self
62:          + p[v] × (-∂λ_v/∂θ_j) / λ
63:      end for
64:    end for
65:
66:    p ← next_p, ∂p/∂θ ← ∂next_p/∂θ
67:
68:    ▷ Accumulate PMF from absorbing vertices
69:    for v_a with children(v_a) = ∅ do
70:      pmf ← pmf + poisson[k] × p[v_a]                    ▷ Kahan summation
71:      poisson_grad_factor ← poisson[k] × (k - λt) / λ
72:      for j ← 1 to d do
73:        ∂pmf/∂θ_j ← ∂pmf/∂θ_j
74:          + poisson[k] × ∂p[v_a]/∂θ_j                    ▷ Term 1: prob gradient
75:          + poisson_grad_factor × ∂λ/∂θ_j × p[v_a]       ▷ Term 2: Poisson gradient
76:      end for
77:      p[v_a] ← 0, ∂p[v_a]/∂θ ← 0                        ▷ Zero absorbed mass
78:    end for
79:
80:    if k > 10 and poisson[k] < 10^{-15} then
81:      break                                      ▷ Early termination
82:    end if
83:  end for
84:
85:  ▷ Phase 4: Convert PMF to PDF (@thm-16-gradient-correctness, @eq-16-pdf-grad-formula)
86:  pdf ← pmf × λ
87:  for j ← 1 to d do
88:    ∂pdf/∂θ_j ← ∂pmf/∂θ_j × λ - pmf × ∂λ/∂θ_j
89:  end for
90:
91:  return (pdf, ∂pdf/∂θ)
92: end function

Correspondence table:

Pseudocode variable Math symbol Code variable (file:function)
G Phase-type graph graph (phasic.c:ptd_graph_pdf_with_gradient)
\boldsymbol{\theta} Parameter vector params
d Number of parameters n_params
t Time point time
\lambda Uniformization rate lambda
\partial \lambda / \partial \theta_j Gradient of uniformization rate lambda_grad[j]
v_{\max} Vertex achieving max rate max_vertex_idx
K Maximum jumps (Poisson truncation) max_jumps
\text{poisson}[k] Precomputed Poisson probability poisson_cache[k]
\mathbf{p} Forward probability vector prob (phasic.c:compute_pmf_with_gradient)
\partial \mathbf{p} / \partial \theta_j Probability gradient matrix prob_grad[v][j]
\text{next\_}\mathbf{p} Next step probability next_prob
\partial \text{next\_}\mathbf{p} / \partial \theta_j Next step gradient next_prob_grad[v][j]
\lambda_v Exit rate of vertex v exit_rate
\partial \lambda_v / \partial \theta_j Exit rate gradient exit_rate_grad[j]
w Edge weight weight
\partial w / \partial \theta_j Edge weight gradient weight_grad[j]
\text{pmf} Accumulated PMF pmf_kahan (Kahan accumulator)
\partial \text{pmf} / \partial \theta_j Accumulated PMF gradient grad_kahan[j]
c_j^{(v,z)} j-th coefficient of edge (v \to z) edge->coefficients[j]

Complexity. Time: O(K \cdot |V| \cdot (|E_{\max}| + d)) where K = O(\lambda t) is the number of Poisson steps, |E_{\max}| is the maximum out-degree, and d = |\boldsymbol{\theta}| is the number of parameters. For each of the K steps, we iterate over all vertices and their edges, and for each edge we perform d gradient updates. Space: O(|V| \cdot d) for the probability and gradient arrays (two copies for double-buffering), plus O(K) for the Poisson cache.

Algorithm 18.1: Correctness. The probability propagation (lines 48, 58) is identical to Algorithm 16.1 with \delta = 1/\lambda (uniformization). The gradient propagation (lines 49–53, 59–63) correctly applies the product rule to each probability transfer (Definition 18.2, Equation 18.4 and Equation 18.6). The PMF accumulation (lines 70–76) combines the two gradient terms from Equation 18.9: the absorption probability gradient (Term 1) and the Poisson weight gradient (Term 2). The final PMF-to-PDF conversion (lines 86–89) implements Equation 18.7 of Theorem 18.1. Kahan summation (Algorithm 4.1) is used for all accumulations to maintain numerical precision over many steps.

Numerical Considerations

Poisson probability computation. The Poisson probabilities are computed in log-space as \exp(-\lambda t + k \log(\lambda t) - \log \Gamma(k+1)) to avoid overflow for large k or \lambda t. The log-gamma function lgamma is numerically stable for all positive arguments.

Poisson tail truncation. The maximum number of steps K is set to \lambda t + 6\sqrt{\lambda t} + 100. The 6\sigma rule ensures that \Pr(N > K) is extremely small (less than 10^{-9} for the Poisson distribution). The additional buffer of 100 handles small \lambda t values where the 6\sigma rule alone would give too few steps. Early termination occurs when \operatorname{Po}(k; \lambda t) < 10^{-15}, saving computation when the tail decays faster than the worst case.

Kahan summation for gradient accumulation. Both the PMF value and each component of the PMF gradient are accumulated using Kahan compensated summation (Kahan 1965). This is critical because the summation spans O(K) terms, each of which may be very small (the Poisson-weighted absorption probabilities decay exponentially in the tails). Without Kahan summation, the gradient can lose several digits of precision.

Zeroing absorbed mass. After accumulating the PMF and gradient contributions from absorbing vertices at each step, the probability and gradient at those vertices are zeroed (line 77). This prevents re-counting absorbed mass at subsequent steps, which would cause the CDF to exceed 1 and the gradient to be incorrect.

Target vertex lookup. The implementation performs a linear scan to find the index of an edge’s target vertex (line 7847 in source). This is O(|V|) per edge, which is inefficient for large graphs. A precomputed index map would reduce this to O(1) but the current implementation prioritizes correctness over performance for this gradient-specific code path.

Sign of \lambda gradient in PDF conversion. The implementation uses \frac{\partial f}{\partial \theta_j} = \frac{\partial \text{PMF}}{\partial \theta_j} \cdot \lambda - \text{PMF} \cdot \frac{\partial \lambda}{\partial \theta_j} (Equation 18.7). The minus sign (rather than plus) is because the PMF gradient accumulation already incorporates the Poisson weight gradient term (Term 2 in Equation 18.9), which accounts for how \lambda affects the Poisson weights. Naively applying the product rule f = \lambda \cdot \text{PMF} with \frac{\partial f}{\partial \theta_j} = \frac{\partial \lambda}{\partial \theta_j} \cdot \text{PMF} + \lambda \cdot \frac{\partial \text{PMF}}{\partial \theta_j} would double-count the \lambda-dependence that is already captured in \frac{\partial \text{PMF}}{\partial \theta_j}.

Implementation Notes

Source code mapping:

Algorithm File Function Lines
Algorithm 18.1 (Phase 1) src/c/phasic.c ptd_graph_pdf_with_gradient L7977–L8054
Algorithm 18.1 (Phases 2–3) src/c/phasic.c compute_pmf_with_gradient L7721–L7962
Algorithm 18.1 (Phase 4) src/c/phasic.c ptd_graph_pdf_with_gradient L8086–L8101

Deviations from pseudocode:

  • The implementation computes exit_rate and exit_rate_grad per vertex, then processes edges with separate weight gradient computation. The pseudocode combines these into a single loop for clarity.
  • Edge weight computation checks coefficients_length > 1 to distinguish parameterized edges from constant edges. Constant edges (with coefficients_length == 1) contribute zero to the gradient.
  • The uniformization rate gradient \partial \lambda / \partial \theta_j is computed only from the edges of the vertex that achieves the maximum rate (max_vertex_idx). This is correct because \lambda = \max_v \lambda_v and the gradient of a max is the gradient of the achieving element (assuming uniqueness; ties are broken by the first vertex encountered).
  • Memory management uses explicit malloc/free with error checking at each allocation. The pseudocode omits these details. A failed allocation causes the function to return -1 after freeing all previously allocated memory.
  • The 2D gradient arrays (prob_grad, next_prob_grad) are allocated as arrays of pointers to rows, not as contiguous blocks. This simplifies row-level operations but may incur cache misses.
  • A commented-out alternative implementation (ptd_graph_pdf_parameterized) exists in the source (lines 8418–8480) that integrates with the update_weight_parameterized workflow. This version is currently disabled.

Symbol Index

Symbol Name First appearance
a_k(\boldsymbol{\theta}) Absorption probability at step k Definition 18.1
c_j^{(v,z)} j-th coefficient of edge (v \to z) Definition 18.2
d Dimension of parameter vector Definition 18.1
K Maximum Poisson steps (truncation) Algorithm 18.1
\lambda(\boldsymbol{\theta}) Uniformization rate (max exit rate) Definition 18.1
\lambda_v(\boldsymbol{\theta}) Exit rate of vertex v Definition 18.1
\partial \mathbf{p}^{(k)} / \partial \theta_j Gradient of forward probability vector Definition 18.2
\operatorname{Po}(k; \mu) Poisson probability mass function Definition 18.1
v_{\max} Vertex achieving maximum exit rate Algorithm 18.1