6 Graph Normalization

Introduction

This file documents the normalization algorithms that prepare phase-type graphs for elimination ([06]) and moment computation. Normalization separates a graph into two components: transition probabilities (the normalized edge weights) and holding time information (vertex scalars). This separation is what allows the elimination algorithm to manipulate transition structure independently of timing.

Phasic provides two normalization algorithms, one for continuous phase-type distributions and one for discrete phase-type distributions. The continuous normalizer divides each edge weight by the total outgoing rate of its source vertex, producing transition probabilities and storing the inverse rates as vertex scalars. The discrete normalizer handles the additional complication of self-loop probabilities: since the graph representation forbids self-loops (edges from a vertex to itself), the probability of remaining in the same state must be encoded through auxiliary vertices.

Both algorithms modify the graph in place and return an array of vertex scalars. Downstream algorithms (elimination, reward transformation, moment computation) operate on the normalized graph and use the vertex scalars to recover the original distribution.

Prerequisites: [01], [02], [04]

Source files: - src/c/phasic.c (functions: ptd_normalize_graph, ptd_dph_normalize_graph)

Definitions

Continuous normalization

Definition 6.1 (Continuous graph normalization) Let G = (V, E, W) be a phase-type graph representing \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) (Definition 2.13). The continuous normalization of G produces:

A normalized graph G' = (V, E, W') where for each edge (v \to z) \in E: w'(v \to z) = \frac{w(v \to z)}{\lambda_v}, \tag{6.1} where \lambda_v = \sum_{z \in \operatorname{children}(v)} w(v \to z) is the total outgoing rate of v.
A vertex scalar array \mathbf{x} = (x_{v_0}, x_{v_1}, \ldots) where: x_v = \begin{cases} 1/\lambda_v & \text{if } \lambda_v > 0, \\ 1 & \text{if } \lambda_v = 0. \end{cases} \tag{6.2}

The normalization is performed in place: after normalization, each edge in G stores w'(v \to z) and the original rates are recoverable only from \mathbf{x}.

Intuition

After normalization, each outgoing edge weight from a transient vertex becomes a transition probability: w'(v \to z) = q_{vz}, the probability that the embedded discrete-time chain transitions from v to z upon leaving v. The vertex scalar x_v = 1/\lambda_v is the expected holding time at vertex v (the mean of the \operatorname{Exp}(\lambda_v) holding time). For absorbing vertices (\lambda_v = 0), the scalar is set to 1 as a neutral default; absorbing vertices have no outgoing edges, so the normalization is vacuous. This decomposition corresponds to the factorization of a continuous-time Markov chain into its embedded chain and its holding time structure (Definition 2.16).

Example

Consider a vertex v_1 with two outgoing edges: w(v_1 \to v_2) = 3 and w(v_1 \to v_a) = 2 (where v_a is absorbing). The total outgoing rate is \lambda_{v_1} = 5. After normalization: w'(v_1 \to v_2) = 3/5 = 0.6, w'(v_1 \to v_a) = 2/5 = 0.4, and x_{v_1} = 1/5 = 0.2. The process waits \operatorname{Exp}(5) time at v_1, then transitions to v_2 with probability 0.6 or absorbs with probability 0.4.

Figure 6.1: Continuous graph normalization. Left: a phase-type graph with raw transition rates on edges (e.g., A has outgoing rates 0.12, 0.20, 0.25, 0.08). Right: the normalized graph with transition probabilities in blue and vertex scalars x_v = 1/\lambda_v in red (e.g., x_A = 5 since \lambda_A = 0.20). The starting vertex S has scalar 0 and the absorbing vertex D has scalar 0.

Definition 6.2 (Embedded discrete-time Markov chain) Let G = (V, E, W) represent \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let G' = (V, E, W') be its continuous normalization (Definition 6.1). The normalized edge weights define the transition probabilities of the embedded discrete-time Markov chain:

q_{ij} = w'(v_i \to v_j) = \frac{s_{ij}}{\lambda_{v_i}} \quad \text{for } i \neq j, \tag{6.3}

where s_{ij} is the (i,j) entry of \mathbf{S} and \lambda_{v_i} = -s_{ii} = \sum_{j \neq i} s_{ij} + s_i is the total departure rate from state i. The absorption probability from state i is q_{i,p+1} = s_i / \lambda_{v_i}.

Intuition

The embedded chain records which state the process jumps to, ignoring how long it waits before jumping. The continuous phase-type distribution is fully determined by the embedded chain probabilities \{q_{ij}\} and the holding rates \{\lambda_{v_i}\}. This decomposition is standard in the theory of continuous-time Markov chains (see, e.g., Norris 1997, chap. 2).

Discrete normalization

Definition 6.3 (Discrete graph normalization) Let G = (V, E, W) be a phase-type graph representing \operatorname{DPH}_p(\boldsymbol{\alpha}, \mathbf{T}) (Definition 2.7). For each non-starting, non-absorbing vertex v_i with self-loop probability p_{ii} = 1 - \sum_{z \in \operatorname{children}(v_i)} w(v_i \to z) > \epsilon (where \epsilon = 10^{-7}), the discrete normalization augments G by:

Creating an auxiliary vertex a_i (Definition 6.4);
Adding an edge (v_i \to a_i) with weight p_{ii};
Adding an edge (a_i \to v_i) with weight 1.0.

The normalization produces a vertex scalar array \mathbf{x} where x_v = 1 for all original vertices and x_{a_i} = 0 for all auxiliary vertices. The graph is modified in place.

Intuition

In a discrete-time Markov chain, a state may have a non-zero probability of remaining in the same state at the next step: T_{ii} > 0. Since phasic’s graph representation forbids self-loops (see Implementation Notes in [02]), this self-transition probability must be encoded differently. The solution is to introduce an auxiliary vertex a_i that acts as a “bounce” state: the chain transitions from v_i to a_i with probability p_{ii}, and from a_i back to v_i with probability 1. This two-step detour has the same geometric holding time distribution as the self-loop.

Example

Consider a DPH vertex v_1 with edges w(v_1 \to v_2) = 0.3 and w(v_1 \to v_a) = 0.2, so the total transition probability is 0.5 and the self-loop probability is p_{11} = 1 - 0.5 = 0.5. After normalization, an auxiliary vertex a_1 is created with edges w(v_1 \to a_1) = 0.5 and w(a_1 \to v_1) = 1.0. The vertex scalars are x_{v_1} = 1 and x_{a_1} = 0. At v_1, the chain either transitions out (to v_2 or v_a) with probability 0.5, or bounces through a_1 and returns, simulating the self-loop.

Definition 6.4 (Auxiliary vertex) An auxiliary vertex a_i created by discrete normalization (Definition 6.3) has the following properties:

It has exactly one outgoing edge: (a_i \to v_i) with weight 1.0;
It has exactly one incoming edge: (v_i \to a_i) with weight p_{ii};
It is assigned vertex scalar x_{a_i} = 0 (zero reward);
It has no state vector in the original state space (it is a structural artifact of the encoding).

Intuition

The zero reward ensures that time spent “bouncing” through the auxiliary vertex does not contribute to accumulated reward or moments. The auxiliary vertex is transparent to the absorption time distribution: the number of steps until absorption in the augmented graph has the same distribution as in the original chain, because the geometric number of bounces through a_i before leaving v_i reproduces the original self-loop holding time.

Theorems and Proofs

Theorem 6.1 (Continuous normalization preserves the PH distribution) Let G = (V, E, W) represent \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) and let (G', \mathbf{x}) be the result of continuous normalization (Definition 6.1). Then the normalized graph G' with vertex scalars \mathbf{x} encodes the same distribution \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}) (Bladt and Nielsen 2017). Specifically, the sub-intensity matrix \mathbf{S} is exactly recoverable as:

s_{ij} = \frac{w'(v_i \to v_j)}{x_{v_i}} = w'(v_i \to v_j) \cdot \lambda_{v_i} \quad \text{for } i \neq j, \tag{6.4}

s_{ii} = -\frac{1}{x_{v_i}} = -\lambda_{v_i}. \tag{6.5}

Proof. By Definition 6.1, the normalized edge weight is w'(v_i \to v_j) = w(v_i \to v_j) / \lambda_{v_i}, and the vertex scalar is x_{v_i} = 1/\lambda_{v_i}. For i \neq j:

\frac{w'(v_i \to v_j)}{x_{v_i}} = \frac{w(v_i \to v_j) / \lambda_{v_i}}{1 / \lambda_{v_i}} = w(v_i \to v_j) = s_{ij},

where the last equality follows from the graph-matrix correspondence (Definition 2.14). For the diagonal:

-\frac{1}{x_{v_i}} = -\lambda_{v_i} = s_{ii},

again by Definition 2.14 (s_{ii} = -\lambda_{v_i}). Since both the initial distribution \boldsymbol{\alpha} (encoded in the starting vertex edges) and the sub-intensity matrix \mathbf{S} are exactly recoverable, the normalized graph with vertex scalars encodes the same \operatorname{PH}_p(\boldsymbol{\alpha}, \mathbf{S}). \square

Theorem 6.2 (DPH normalization preserves the DPH distribution) Let G = (V, E, W) represent \operatorname{DPH}_p(\boldsymbol{\alpha}, \mathbf{T}) and let (G^+, \mathbf{x}) be the result of discrete normalization (Definition 6.3), where G^+ is the augmented graph with auxiliary vertices. Then the absorption time \tau^+ in the augmented chain (counting only steps that arrive at original vertices, i.e., steps where x_v = 1) has the same distribution as \tau \sim \operatorname{DPH}_p(\boldsymbol{\alpha}, \mathbf{T}).

Proof. We show that the reward-weighted absorption time in the augmented chain G^+ (with vertex scalars x_v) has the same distribution as \tau in the original chain.

In the original DPH chain on \{v_1, \ldots, v_p, \text{absorb}\}, the transition probabilities from v_i are: T_{ij} to v_j for j \neq i, self-loop probability T_{ii}, and absorption probability t_i = 1 - \sum_j T_{ij}. These sum to 1. The absorption time \tau counts every step, including self-loops.

In the augmented chain G^+, each vertex v_i with T_{ii} > 0 has its self-loop replaced by a two-edge bounce: w(v_i \to a_i) = T_{ii} and w(a_i \to v_i) = 1. The vertex scalars are x_{v_i} = 1 for original vertices and x_{a_i} = 0 for auxiliary vertices.

Step 1: The augmented chain is a valid Markov chain. At v_i, the outgoing edge weights are T_{ij} (for j \neq i), t_i (to absorbing state), and T_{ii} (to a_i). These sum to \sum_{j \neq i} T_{ij} + t_i + T_{ii} = 1. At a_i, the single edge to v_i has weight 1. So all transition probabilities are valid.

Step 2: Weighted time equals original step count. Define the weighted absorption time \tilde{\tau} = \sum_{t=0}^{\tau^+} x_{X_t}, where \tau^+ is the first time the augmented chain reaches an absorbing state. Since x_{a_i} = 0, auxiliary visits contribute nothing. Each visit to an original vertex v_i contributes x_{v_i} = 1.

In the original chain, starting at v_i, one step either: (a) self-loops with probability T_{ii}, contributing 1 to \tau, or (b) transitions to v_j or absorbs with probability 1 - T_{ii}, contributing 1 to \tau.

In the augmented chain, starting at v_i: the visit to v_i itself contributes x_{v_i} = 1 to \tilde{\tau}. Then with probability T_{ii}, the chain goes to a_i (contributing x_{a_i} = 0), then deterministically returns to v_i (contributing x_{v_i} = 1 on the next visit). With probability 1 - T_{ii}, the chain transitions to v_j or absorbs.

In both chains, the contribution per self-loop cycle is 1 (one step in the original; one original-vertex visit in the augmented). The probability of k self-loops before leaving v_i is T_{ii}^k(1-T_{ii}) in both chains (geometric distribution). The destination after leaving v_i is v_j with probability T_{ij}/(1-T_{ii}) in both chains (since the outgoing non-self-loop edges are identical).

Step 3: Distribution equivalence. By Step 2, starting from any original vertex v_i, the augmented chain’s weighted time until the next original-vertex transition and the destination vertex have the same joint distribution as in the original chain. Since the initial distribution \boldsymbol{\alpha} only assigns mass to original vertices, an inductive argument on the number of transitions gives \tilde{\tau} \stackrel{d}{=} \tau. \square

Algorithms

6.0.0.1 Continuous Graph Normalization

Description. This algorithm normalizes a continuous phase-type graph in place. For each vertex, it computes the total outgoing rate \lambda_v, divides each outgoing edge weight by \lambda_v, and stores x_v = 1/\lambda_v in the output array. Absorbing vertices (with \lambda_v = 0) receive x_v = 1.

Continuous Graph Normalization
1: Let G = (V, E, W) be a phase-type graph representing PH_p(alpha, S) (@def-01-ph-graph)
2:
3: function NormalizeGraph(G)
4:   x <- array of size |V|
5:   for i <- 0 to |V| - 1 do
6:     v <- V[i]
7:     lambda_v <- 0
8:     for (v -> z) in E(v) do                            > Compute total outgoing rate
9:       lambda_v <- lambda_v + w(v -> z)
10:    end for
11:    if lambda_v = 0 then
12:      x[i] <- 1                                        > Absorbing vertex: neutral default
13:    else
14:      x[i] <- 1 / lambda_v
15:    end if
16:    for (v -> z) in E(v) do                              > Normalize edge weights
17:      w(v -> z) <- w(v -> z) / lambda_v
18:    end for
19:  end for
20:  return x
21: end function

Correspondence table:

Pseudocode variable	Math symbol	Code variable (file:function)
G	G = (V, E, W)	`ptd_graph*` (phasic.c:`ptd_normalize_graph`)
v	v \in V	`vertex`
\lambda_v	\lambda_v	`rate`
\mathbf{x}	\mathbf{x} (vertex scalar array)	`res`
x[i]	x_v	`res[i]`

Complexity. Time: O(|V| + |E|). Each vertex is visited once, and each edge is accessed twice (once for rate summation, once for normalization). Space: O(|V|) for the output array \mathbf{x}.

Algorithm 6.1: Correctness. Follows from Theorem 6.1. The algorithm computes \lambda_v and x_v exactly as specified in Definition 6.1, and the division w(v \to z) / \lambda_v produces the transition probabilities q_{vz} of the embedded chain (Definition 6.2).

6.0.0.2 Discrete Graph Normalization

Description. This algorithm normalizes a discrete phase-type graph by creating auxiliary vertices for self-loop probabilities. For each non-starting, non-absorbing vertex with self-loop probability exceeding a threshold \epsilon = 10^{-7}, it creates an auxiliary vertex and the corresponding bounce edges. The vertex scalar array stores 1 for original vertices and 0 for auxiliary vertices.

Discrete Graph Normalization
1: Let G = (V, E, W) be a phase-type graph representing DPH_p(alpha, T) (@def-01-dph)
2: Let epsilon <- 10^{-7}
3: Let n <- |V|
4:
5: function DPHNormalizeGraph(G)
6:   x <- array of size 2n, initialized to 1              > Over-allocate for auxiliary vertices
7:   for i <- 0 to n - 1 do
8:     v <- V[i]
9:     if v = v_0 then                                    > Skip starting vertex
10:      continue
11:    end if
12:    lambda_v <- 0
13:    for (v -> z) in E(v) do
14:      lambda_v <- lambda_v + w(v -> z)
15:    end for
16:    if lambda_v = 0 then                                > Absorbing vertex: skip
17:      continue
18:    end if
19:    p_self <- 1 - lambda_v                               > Self-loop probability
20:    if p_self > epsilon then
21:      a <- CreateVertex(G)                              > Auxiliary vertex (@def-05-auxiliary-vertex)
22:      AddEdge(v, a, p_self)                             > Edge from v to auxiliary
23:      AddEdge(a, v, 1.0)                                > Deterministic return edge
24:      x[index(a)] <- 0                                  > Zero reward for auxiliary vertex
25:    end if
26:  end for
27:  return x
28: end function

Correspondence table:

Pseudocode variable	Math symbol	Code variable (file:function)
G	G = (V, E, W)	`ptd_graph*` (phasic.c:`ptd_dph_normalize_graph`)
v	v_i \in V	`vertex`
\lambda_v	\sum_{z} w(v \to z)	`rate`
p_{\text{self}}	p_{ii} = 1 - \lambda_v	`1 - rate`
a	a_i (auxiliary vertex)	`auxiliary_vertex`
\mathbf{x}	\mathbf{x} (vertex scalar array)	`res`
\epsilon	\epsilon (threshold)	`0.0000001`
n	\|V\| (original vertex count)	`old_length`

Complexity. Time: O(|V| + |E|). Each original vertex is visited once, each edge is examined once for rate computation, and at most |V| auxiliary vertices are created (each creation is O(1)). Space: O(|V|) for the output array (allocated at 2|V| to accommodate auxiliary vertices) plus O(|V|) for the auxiliary vertices and edges added to the graph.

Algorithm 6.2: Correctness. Follows from Theorem 6.2. Each auxiliary vertex a_i with its two edges encodes the self-loop probability T_{ii} as a bounce, and the zero vertex scalar ensures auxiliary visits do not contribute to the absorption time count.

Numerical Considerations

Division by zero in continuous normalization. When \lambda_v = 0 (absorbing vertex), the algorithm sets x_v = 1 and skips the edge normalization loop (there are no edges to normalize). The check rate == 0 uses exact floating-point comparison, which is safe because absorbing vertices have exactly zero outgoing edges, so the sum is exactly 0.0.

Near-zero self-loop threshold in discrete normalization. The threshold \epsilon = 10^{-7} prevents creating auxiliary vertices for negligible self-loop probabilities. If p_{ii} \leq \epsilon, the self-loop probability is treated as zero. This avoids unnecessary graph expansion for vertices where the self-loop probability is due to floating-point rounding rather than a genuine self-transition. The threshold value is hardcoded rather than parameterized.

Division by \lambda_v when \lambda_v is small. In continuous normalization, if \lambda_v is very small but positive, the division w(v \to z) / \lambda_v can amplify floating-point errors. However, in practice, w(v \to z) \leq \lambda_v for all edges (since edge weights are non-negative and \lambda_v is their sum), so the normalized weight satisfies 0 \leq w'(v \to z) \leq 1, and the division is numerically stable.

Implementation Notes

Source code mapping:

Algorithm	File	Function	Lines
Algorithm 6.1	`src/c/phasic.c`	`ptd_normalize_graph`	L2337–L2360
Algorithm 6.2	`src/c/phasic.c`	`ptd_dph_normalize_graph`	L2362–L2391

Deviations from pseudocode:

Algorithm 8: The implementation does not check for the starting vertex or absorbing vertices explicitly. The rate == 0 check handles absorbing vertices, and the starting vertex is normalized like any other vertex (its outgoing edge weights, which encode \boldsymbol{\alpha}, are divided by \lambda_{v_0}, and x_{v_0} = 1/\lambda_{v_0}). This means the starting vertex’s edges become normalized probabilities \alpha_i / \sum_j \alpha_j, and the vertex scalar stores 1/\sum_j \alpha_j. If \sum_j \alpha_j = 1 (no defect), this is a no-op on the probabilities. If there is a defect (\sum_j \alpha_j < 1), the normalization changes the starting vertex edges; downstream algorithms must account for this.
Algorithm 9: The implementation allocates old_length * 2 entries for the vertex scalar array, anticipating that up to |V| auxiliary vertices may be created (one per original vertex). This is an upper bound; in practice, fewer auxiliary vertices are needed. The starting vertex is explicitly skipped (graph->starting_vertex == vertex), and absorbing vertices are implicitly skipped (rate == 0).
Algorithm 6.2 (edge creation): The ptd_graph_add_edge function is called with a coefficient vector of length 1 containing the weight. This means auxiliary vertex edges are stored as parameterized edges with a single coefficient, even though they are structurally constant. This is consistent with the graph’s edge storage model but introduces minor overhead for non-parameterized graphs.

Remark (SUSPECTED_CODE_ISSUE: Continuous normalization of starting vertex). The continuous normalization algorithm normalizes the starting vertex v_0 in the same way as transient vertices. When the initial distribution has a defect (\alpha_0 > 0, so \lambda_{v_0} = \sum_i \alpha_i < 1), the normalization produces w'(v_0 \to v_i) = \alpha_i / \lambda_{v_0} and x_{v_0} = 1/\lambda_{v_0}. Downstream algorithms that expect the starting vertex edges to encode the raw initial probabilities \alpha_i must multiply by x_{v_0}^{-1} to recover them. This is not necessarily a bug (the elimination algorithm may account for it), but it is worth noting that the normalization changes the semantics of the starting vertex edges when there is a defect.

Symbol Index

Symbol	Name	First appearance
a_i	Auxiliary vertex for vertex v_i	Definition 6.4
\epsilon	Self-loop probability threshold (10^{-7})	Definition 6.3
p_{ii}	Self-loop probability at vertex v_i	Definition 6.3
q_{ij}	Embedded chain transition probability	Definition 6.2
\mathbf{x}	Vertex scalar array	Definition 6.1