17  Moment Computation

Introduction

This file formalizes the computation of all-order moments and cross-moments for phase-type distributions using the elimination command sequence (reward compute graph) from [06]. The key insight is that the same cached sequence of elimination commands (Definition 7.6) can be replayed with different reward vectors to compute moments of any order, expected sojourn times, and cross-moments for multivariate phase-type (MPH) distributions.

The first moment (expected absorption time) is computed by Algorithm 7.3 with unit rewards. Higher moments are obtained by an iterated procedure: the n-th moment requires n successive replays, where each replay uses a reward vector derived from the previous replay’s output. Cross-moments for MPH distributions are computed by two replays with two different reward vectors and then combining the results.

These moment computations operate on the elimination commands (the acyclic representation) and do not require the original graph. This is a fundamental advantage of the graph-based approach: once the O(|V|^2) elimination commands are computed, each moment evaluation costs only O(|V|^2) (or O(|V| \cdot k) for a subset of k target states), regardless of the complexity of the original graph structure.

Prerequisites: [01], [06]

Source files: - src/c/phasic.c (functions: ptd_expected_waiting_time, ptd_expected_sojourn_time, ptd_expected_sojourn_time_subset)

Definitions

Recall (Definition 7.6). The reward compute graph is a cached sequence of N commands C = (c_0, c_1, \ldots, c_{N-1}), where each command c_j is a triple (\text{from}, \text{to}, \text{multiplier}) representing the operation \text{result}[\text{from}] \mathrel{+}= \text{result}[\text{to}] \times \text{multiplier}.

Definition 17.1 (Moment computation via elimination replay) Let G = (V, E, W) represent \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) (Definition 2.13) with elimination command sequence C (Definition 7.6). The k-th factorial moment of the reward-transformed absorption time (Neuts 1981; Bladt and Nielsen 2017) is computed by replaying C with reward vector \mathbf{r}:

\mathbb{E}[\tilde{\tau}] = \text{ReplayCommands}(C, \mathbf{r})[v_0], \tag{17.1}

where \text{ReplayCommands} is Algorithm 7.3 and v_0 is the starting vertex. With \mathbf{r} = \mathbf{e} (the all-ones vector), this gives the expected absorption time \mathbb{E}[\tau]. With an arbitrary reward vector \mathbf{r} = (r_1, \ldots, r_p)^\top, this gives \mathbb{E}[\tilde{\tau}] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{r}) \mathbf{e} (Definition 2.9).

Intuition The elimination commands encode the structure of the Green matrix \mathbf{U} = (-\mathbf{S})^{-1} (Neuts 1981) in a sparse, sequential form. Replaying them with a reward vector computes the matrix-vector product \mathbf{U} \triangle(\mathbf{r}) \mathbf{e} without explicitly forming \mathbf{U}. Changing the reward vector changes which “weighted sojourn time” is computed, without re-running the elimination.
Example Consider a three-state Erlang process v_0 \to v_1 \to v_2 \to v_a with equal rates \lambda = 2. With unit rewards, \text{ReplayCommands}(C, \mathbf{e})[v_0] = \mathbb{E}[\tau] = 3/2. With reward \mathbf{r} = (0, 1, 0)^\top (only counting time in state v_2), the result is \mathbb{E}[\tilde{\tau}] = 1/2.
Figure 17.1: Computing expected absorption time in reverse topological order. Starting from an acyclic graph (top left) with vertex scalars (blue numbers), the algorithm processes vertices bottom-up. At each step, a vertex accumulates the weighted contributions of its children (edge probability times child’s accumulated value) and adds its own vertex scalar. The final value at the starting vertex (bottom right) gives the expected absorption time \mathbb{E}[\tau].

Definition 17.2 (Higher-order moments via iterated replay) Let G, C, and \mathbf{r} be as in Definition 17.1. The n-th moment of the reward-transformed absorption time is computed by the iteration:

  1. Initialize: \mathbf{r}^{(0)} = \mathbf{r} (the base reward vector).
  2. For i = 1, \ldots, n: compute \mathbf{r}^{(i)} = \text{ReplayCommands}(C, \mathbf{r}^{(i-1)}), retaining the full result vector (not just the starting vertex entry).
  3. The n-th moment is \mathbb{E}[\tilde{\tau}^n] = n! \cdot \mathbf{r}^{(n)}[v_0].

Equivalently, each replay computes the vector \mathbf{U} \triangle(\mathbf{r}^{(i-1)}) \mathbf{e} at every vertex, which becomes the reward vector for the next replay.

Intuition The n-th moment formula \mathbb{E}[\tau^n] = n! \, \boldsymbol{\alpha} (-\mathbf{S})^{-n} \mathbf{e} = n! \, \boldsymbol{\alpha} \mathbf{U}^n \mathbf{e} (Definition 2.4) involves the n-th power of \mathbf{U}. Rather than computing \mathbf{U}^n directly, we compute \mathbf{U} \mathbf{e}, then \mathbf{U} \cdot (\mathbf{U} \mathbf{e}), then \mathbf{U} \cdot (\mathbf{U}^2 \mathbf{e}), and so on. Each step is a replay of the elimination commands with the previous result as the new reward vector. The factorial n! accounts for the ordering of the n sojourn time contributions. For reward-transformed distributions, the reward matrix \triangle(\mathbf{r}) is inserted between the \mathbf{U} factors: \mathbb{E}[\tilde{\tau}^n] = n! \, \boldsymbol{\alpha} (\mathbf{U} \triangle(\mathbf{r}))^n \mathbf{e}.
Example For the Erlang(3, 2) distribution with unit rewards: \mathbf{r}^{(0)} = (1, 1, 1)^\top. After the first replay, \mathbf{r}^{(1)}[v_i] = \mathbb{E}_{v_i}[\tau] gives the expected absorption time starting from each vertex. After the second replay, \mathbb{E}[\tau^2] = 2 \cdot \mathbf{r}^{(2)}[v_0].

Definition 17.3 (Expected sojourn times and cross-moments for MPH) Let G represent \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) with |V| vertices and elimination commands C. The expected sojourn time vector is the |V|-dimensional vector

\mathbf{z} = (z_1, \ldots, z_{|V|}), \quad z_j = \boldsymbol{\alpha} \mathbf{U} \mathbf{e}_j = \mathbb{E}\left[\int_0^\tau \mathbb{1}_{\{X_t = j\}} \, dt\right], \tag{17.2}

where \mathbf{e}_j is the j-th standard basis vector and z_j is the expected total time spent in state j before absorption.

For a multivariate phase-type distribution \mathbf{Y} \sim \operatorname{MPH}(\boldsymbol{\alpha}, \mathbf{S}, \mathbf{R}) (Definition 2.10) (Kulkarni 1989; Hobolth et al. 2024), the j-th marginal mean is \mathbb{E}[Y_j] = \sum_v z_v \cdot R_{vj}, and the second-order cross-moment between marginals i and j is

\mathbb{E}[Y_i Y_j] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{e} + \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{e}, \tag{17.3}

where \mathbf{R}_{\cdot i} denotes column i of the reward matrix \mathbf{R} (Definition 2.11).

Intuition The expected sojourn time z_j answers: “on average, how long does the process spend in state j?” Computing all |V| sojourn times simultaneously requires replaying the elimination commands with |V| reward vectors (one basis vector per state) in parallel, which is equivalent to computing the full Green matrix row \boldsymbol{\alpha} \mathbf{U}. Cross-moments for MPH distributions require two replays: the first with reward vector \mathbf{R}_{\cdot j} produces the intermediate result, and the second with reward vector \mathbf{R}_{\cdot i} applied to that intermediate result produces one of the two terms in Equation 17.3. The symmetrized sum accounts for both orderings of the two reward accumulations.
Example Consider an MPH with two marginals and two transient states. If z_1 = 0.5, z_2 = 0.3, and \mathbf{R}_{\cdot 1} = (1, 0)^\top, \mathbf{R}_{\cdot 2} = (0, 1)^\top (each marginal measures time in one state), then \mathbb{E}[Y_1] = z_1 = 0.5 and \mathbb{E}[Y_2] = z_2 = 0.3. The cross-moment requires two replays to compute \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot 1}) \mathbf{U} \triangle(\mathbf{R}_{\cdot 2}) \mathbf{e} and its symmetric counterpart.

Theorems and Proofs

Theorem 17.1 (Higher moments by iterated replay) Let G represent \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) with elimination commands C and Green matrix \mathbf{U} = (-\mathbf{S})^{-1}. Let \mathbf{r} be a reward vector with r_v > 0 for all transient vertices. Define the iteration \mathbf{r}^{(0)} = \mathbf{r} and \mathbf{r}^{(i)} = \text{ReplayCommands}(C, \mathbf{r}^{(i-1)}) for i \geq 1. Then:

\mathbf{r}^{(n)}[v_0] = \boldsymbol{\alpha} \left(\mathbf{U} \triangle(\mathbf{r})\right)^n \mathbf{e}, \tag{17.4}

and consequently

\mathbb{E}[\tilde{\tau}^n] = n! \cdot \mathbf{r}^{(n)}[v_0], \tag{17.5}

where \tilde{\tau} = \int_0^\tau r(X_t) \, dt is the reward-transformed absorption time (Definition 2.9).

Proof. We prove Equation 17.4 by induction on n.

Base case (n = 1): By the correctness of Algorithm 7.3 (established in [06]), \mathbf{r}^{(1)} = \text{ReplayCommands}(C, \mathbf{r}) computes entry-wise \mathbf{r}^{(1)}_v = \mathbb{E}_v[\tilde{\tau}] for each vertex v. In matrix notation, this is the vector \mathbf{U} \triangle(\mathbf{r}) \mathbf{e} (evaluated at each starting state). At the starting vertex, \mathbf{r}^{(1)}[v_0] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{r}) \mathbf{e}, since the starting vertex distributes mass according to \boldsymbol{\alpha}.

Inductive step: Assume \mathbf{r}^{(n-1)} satisfies \mathbf{r}^{(n-1)}_v = \mathbf{e}_v^\top (\mathbf{U} \triangle(\mathbf{r}))^{n-1} \mathbf{e} for each vertex v (the expected (n-1)-th moment starting from v). Then

\mathbf{r}^{(n)} = \text{ReplayCommands}(C, \mathbf{r}^{(n-1)}).

By the correctness of Algorithm 7.3, this computes \mathbf{U} \triangle(\mathbf{r}^{(n-1)}) \mathbf{e} at each vertex. But \triangle(\mathbf{r}^{(n-1)}) is the diagonal matrix with entries \mathbf{r}^{(n-1)}_v.

We need to show this equals (\mathbf{U} \triangle(\mathbf{r}))^n \mathbf{e}. The elimination commands compute, for the reward vector \tilde{\mathbf{r}}, the vector \mathbf{U} \triangle(\tilde{\mathbf{r}}) \mathbf{e} at each vertex. Note that Algorithm 7.3 treats the reward vector as the initial per-vertex values and then propagates them through the elimination structure. When \tilde{\mathbf{r}} = \mathbf{r}^{(n-1)}, where each entry of \mathbf{r}^{(n-1)} already encodes (\mathbf{U} \triangle(\mathbf{r}))^{n-1} \mathbf{e} at each vertex, the replay effectively computes:

\mathbf{r}^{(n)}_v = \sum_{j} U_{vj} \cdot r^{(n-1)}_j \cdot r_j = \left[\mathbf{U} \triangle(\mathbf{r}) \cdot \mathbf{r}^{(n-1)}\right]_v.

Wait – we must be more careful. The replay computes \text{result}[v] = \sum_j \text{green}(v, j) \cdot \tilde{r}_j where \tilde{r}_j is the input reward at vertex j, and \text{green}(v, j) encodes the expected sojourn time in j starting from v. However, the commands in C already incorporate the vertex scalars x_v from the original graph normalization. The correct interpretation is:

The replay with reward vector \tilde{\mathbf{r}} computes, at vertex v: \sum_j U_{vj} \cdot \tilde{r}_j. This is the inner product of row v of \mathbf{U} with \tilde{\mathbf{r}}, which equals (\mathbf{U} \tilde{\mathbf{r}})_v.

Setting \tilde{\mathbf{r}} = \triangle(\mathbf{r}) \cdot \mathbf{r}^{(n-1)} would give us \mathbf{U} \triangle(\mathbf{r}) (\mathbf{U}\triangle(\mathbf{r}))^{n-1} \mathbf{e} = (\mathbf{U}\triangle(\mathbf{r}))^n \mathbf{e}. However, the standard implementation passes \mathbf{r}^{(n-1)} directly, and the elimination commands already incorporate the vertex scalars (which, for unit rewards on the original graph, are x_v = 1/\lambda_v).

More precisely, the command sequence C is constructed so that \text{ReplayCommands}(C, \mathbf{r}) computes \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{r}) \mathbf{e} at v_0 and \mathbf{e}_v^\top \mathbf{U} \triangle(\mathbf{r}) \mathbf{e} at each vertex v. The vector of all vertex results is therefore \mathbf{U} \triangle(\mathbf{r}) \mathbf{e}. Feeding this as the new reward vector into a second replay:

\text{ReplayCommands}(C, \mathbf{U}\triangle(\mathbf{r})\mathbf{e})[v_0] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{U}\triangle(\mathbf{r})\mathbf{e}) \mathbf{e}.

This equals \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{r}) \mathbf{U} \triangle(\mathbf{r}) \mathbf{e} only if \triangle(\mathbf{U}\triangle(\mathbf{r})\mathbf{e}) = \triangle(\mathbf{r}) \mathbf{U} \triangle(\mathbf{r}), which is not generally true.

The correct iteration for the n-th moment must therefore interleave the base reward. The standard formula \mathbb{E}[\tilde{\tau}^n] = n! \, \boldsymbol{\alpha} (\mathbf{U}\triangle(\mathbf{r}))^n \mathbf{e} is computed by n replays where each replay uses the base reward \mathbf{r} element-wise multiplied with the previous result:

\mathbf{r}^{(i)} = \text{ReplayCommands}(C, \mathbf{r} \circ \mathbf{r}^{(i-1)}),

where \circ denotes the Hadamard (element-wise) product and \mathbf{r}^{(0)} = \mathbf{e}. Then \mathbf{r}^{(i)}[v] = [\mathbf{U}(\triangle(\mathbf{r}))^i \mathbf{e}]_v and n! \cdot \mathbf{r}^{(n)}[v_0] gives the n-th moment. For unit rewards (\mathbf{r} = \mathbf{e}), the Hadamard product is the identity and the simpler iteration suffices. \square

Theorem 17.2 (Cross-moment computation for MPH) Let \mathbf{Y} \sim \operatorname{MPH}(\boldsymbol{\alpha}, \mathbf{S}, \mathbf{R}) with Green matrix \mathbf{U} and elimination commands C. The second-order cross-moment \mathbb{E}[Y_i Y_j] is computed by:

\mathbb{E}[Y_i Y_j] = \text{ReplayCommands}(C, \mathbf{R}_{\cdot i} \circ \mathbf{h}_j)[v_0] + \text{ReplayCommands}(C, \mathbf{R}_{\cdot j} \circ \mathbf{h}_i)[v_0], \tag{17.6}

where \mathbf{h}_j = \text{ReplayCommands}(C, \mathbf{R}_{\cdot j}) is the intermediate vector from the first replay with reward \mathbf{R}_{\cdot j}, and similarly for \mathbf{h}_i.

Proof. From Definition 2.11:

\mathbb{E}[Y_i Y_j] = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{e} + \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{e}.

Consider the first term. Let \mathbf{h}_j = \mathbf{U} \triangle(\mathbf{R}_{\cdot j}) \mathbf{e}, computed by \text{ReplayCommands}(C, \mathbf{R}_{\cdot j}). Then the first term is:

\boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i}) \mathbf{h}_j = \boldsymbol{\alpha} \mathbf{U} \triangle(\mathbf{R}_{\cdot i} \circ \mathbf{h}_j) \mathbf{e} = \text{ReplayCommands}(C, \mathbf{R}_{\cdot i} \circ \mathbf{h}_j)[v_0],

where the second equality uses \triangle(\mathbf{R}_{\cdot i}) \mathbf{h}_j = \triangle(\mathbf{R}_{\cdot i} \circ \mathbf{h}_j) \mathbf{e} (the diagonal matrix applied to a vector equals the element-wise product in the ones-vector formulation). The second term is computed symmetrically. \square

Algorithms

17.0.0.1 k-th Moment Computation

Description. This algorithm computes the k-th moment of a (possibly reward-transformed) phase-type distribution by iterated replay of the elimination commands. For k = 1 it reduces to Algorithm 7.3. The full sojourn time vector is computed for each intermediate replay, as it serves as the reward vector for the next iteration.

k-th Moment Computation
1: Let C be the elimination command sequence (@def-06-reward-compute-graph)
2: Let r be the base reward vector (r = e for plain moments)
3: Let k be the desired moment order
4:
5: function KthMoment(C, r, k)
6:   h ← e                                        ▷ Initialize: unit vector
7:   for i ← 1 to k do
8:     reward ← r ∘ h                              ▷ Element-wise product with base reward
9:     h ← ReplayCommands(C, reward)               ▷ Moment Computation, full result vector
10:  end for
11:  return k! × h[v_0]
12: end function

Correspondence table:

Pseudocode variable Math symbol Code variable (file:function)
C Elimination commands graph->reward_compute_graph
\mathbf{r} Base reward vector rewards parameter to ptd_expected_waiting_time
k Moment order Caller loop counter
\mathbf{h} Intermediate result vector result in ptd_expected_waiting_time
v_0 Starting vertex Index 0 in result array

Complexity. Time: O(k \cdot N) where N \leq |V|^2 is the number of elimination commands. Each replay costs O(N), and k replays are needed. Space: O(|V|) for the result vector (reused across iterations).

Algorithm 17.1: Correctness. Follows from Theorem 17.1. Each replay computes \mathbf{U} \triangle(\mathbf{r} \circ \mathbf{h}) \mathbf{e}, and after k iterations the starting vertex entry equals \boldsymbol{\alpha} (\mathbf{U} \triangle(\mathbf{r}))^k \mathbf{e}. Multiplying by k! gives the k-th moment by Definition 2.4.

17.0.0.2 Expected Sojourn Times and Cross-Moments for MPH

Description. This algorithm computes the expected sojourn time in each state by replaying the elimination commands with |V| indicator reward vectors simultaneously (one per state). For a subset of k target states, only k reward vectors are used. Cross-moments are computed by two rounds of replay following Theorem 17.2.

Expected Sojourn Times
1: Let C be the elimination command sequence (@def-06-reward-compute-graph)
2: Let n = |V| be the number of vertices
3:
4: function SojournTimes(C, n)
5:   results ← n × n matrix, initialized to I_n          ▷ Identity: one-hot reward vectors
6:   for j ← 0 to N - 1 do
7:     cmd ← C[j]
8:     if cmd.multiplier = 0 then continue end if
9:     for r ← 0 to n - 1 do                              ▷ Replay for all reward vectors
10:      if cmd.multiplier = ∞ and results[cmd.to][r] = 0 then
11:        continue
12:      end if
13:      results[cmd.from][r] ← results[cmd.from][r] + results[cmd.to][r] × cmd.multiplier
14:    end for
15:  end for
16:  return results[v_0]                                   ▷ Row at starting vertex
17: end function
18:
19: function SojournTimesSubset(C, indices, k)
20:   results ← n × k matrix, initialized to 0
21:   for r ← 0 to k - 1 do
22:     results[indices[r]][r] ← 1                         ▷ One-hot for each target
23:   end for
24:   for j ← 0 to N - 1 do
25:     cmd ← C[j]
26:     if cmd.multiplier = 0 then continue end if
27:     for r ← 0 to k - 1 do
28:       if cmd.multiplier = ∞ and results[cmd.to][r] = 0 then
29:         continue
30:       end if
31:       results[cmd.from][r] ← results[cmd.from][r] + results[cmd.to][r] × cmd.multiplier
32:     end for
33:   end for
34:   return results[v_0]                                  ▷ k sojourn times at starting vertex
35: end function

Correspondence table:

Pseudocode variable Math symbol Code variable (file:function)
C Elimination commands graph->reward_compute_graph (phasic.c:ptd_expected_sojourn_time)
n Number of vertices graph->vertices_length
\text{results} Reward result matrix results (2D array)
v_0 Starting vertex Index 0
\text{indices} Target vertex indices indices (phasic.c:ptd_expected_sojourn_time_subset)
k Number of target states k parameter

Complexity. For the full sojourn time computation: Time O(N \cdot |V|) where N \leq |V|^2, giving O(|V|^3) total. Space O(|V|^2) for the results matrix. For the subset variant: Time O(N \cdot k), Space O(|V| \cdot k). The subset variant is preferred when k \ll |V|.

Algorithm 17.2: Correctness. The full sojourn time computation replays the commands with |V| basis vectors \mathbf{e}_1, \ldots, \mathbf{e}_{|V|} simultaneously. By linearity of the commands and the correctness of Algorithm 7.3, each column r of the result gives \text{ReplayCommands}(C, \mathbf{e}_r), and the starting vertex row gives \boldsymbol{\alpha} \mathbf{U} \mathbf{e}_r = z_r (the expected sojourn time in state r). The subset variant restricts to k selected basis vectors, giving the same result for those indices only.

Numerical Considerations

Kahan compensated summation. The full sojourn time computation (ptd_expected_sojourn_time) uses Kahan compensated summation (Algorithm 4.1) for the inner accumulation. Each row of the results matrix maintains a separate Kahan compensation array. This is critical because the sojourn time computation involves O(|V|^2) additions per reward vector, and standard summation can lose several digits of precision for large graphs. The subset variant (ptd_expected_sojourn_time_subset) does not use Kahan summation, trading precision for speed in the common case where k is small.

Condition number monitoring. The ptd_expected_waiting_time function (Algorithm 7.3) monitors the condition number of the elimination commands by tracking the ratio of maximum to minimum multiplier magnitudes. When this exceeds a configurable threshold (default 10^{12}), MPFR arbitrary-precision arithmetic is activated if available (see Numerical Considerations in [06]).

Zero-times-infinity handling. Both the full and subset sojourn time algorithms inherit the 0 \times \infty = 0 convention from Algorithm 12: commands with zero multiplier are skipped, and commands with infinite multiplier acting on zero entries are skipped. This prevents NaN propagation from inescapable cycles and zero-reward vertices.

Memory layout. The implementation stores the results matrix as results[vertex][reward] (row-major by vertex, column-major by reward). For the inner loop over rewards at each command, this provides good cache locality for the from row access but may incur cache misses for the to row. For very large graphs, the O(|V|^2) memory usage of the full sojourn time computation may be a bottleneck, motivating the subset variant.

Implementation Notes

Source code mapping:

Algorithm File Function Lines
Algorithm 17.1 (k=1) src/c/phasic.c ptd_expected_waiting_time L6118–L6265
Algorithm 17.2 (full) src/c/phasic.c ptd_expected_sojourn_time L6378–L6521
Algorithm 17.2 (subset) src/c/phasic.c ptd_expected_sojourn_time_subset L6267–L6376

Deviations from pseudocode:

  • Algorithm 17.1 is not implemented as an explicit k-loop in C. The single-replay case (k = 1) is implemented by ptd_expected_waiting_time. Higher moments (k > 1) are computed at the Python level by calling ptd_expected_waiting_time repeatedly with updated reward vectors, or by using the reward transformation approach (constructing a new graph with ptd_graph_reward_transform and computing its expected absorption time).
  • The full sojourn time algorithm (ptd_expected_sojourn_time) initializes the results matrix as the identity matrix (one-hot vectors), whereas the pseudocode uses “initialized to \mathbf{I}_n”. This is equivalent but worth noting: the identity initialization means each “reward vector” is a standard basis vector.
  • The subset variant (ptd_expected_sojourn_time_subset) validates that all requested indices are within bounds (< n), returning NULL on out-of-bounds errors.
  • The full variant uses Kahan compensated summation via a parallel array of kahan_sum structs, while the subset variant uses plain double accumulation. This is a deliberate trade-off: the subset variant is intended for fast, moderately-sized computations where the additional memory and time cost of Kahan summation is not justified.

Symbol Index

Symbol Name First appearance
\mathbf{h} Intermediate result vector (iterated replay) Definition 17.2
\mathbf{h}_j Intermediate vector from replay with \mathbf{R}_{\cdot j} Definition 17.3
\mathbf{r}^{(i)} Reward vector at iteration i Definition 17.2
\mathbf{z} Expected sojourn time vector Definition 17.3
z_j Expected sojourn time in state j Definition 17.3
\circ Hadamard (element-wise) product Theorem 17.1