5 SCC and Topological Sort

Introduction

This file documents the two structural graph algorithms that precede every major computation in phasic: strongly connected component (SCC) detection and topological sorting. Together, these algorithms decompose a phase-type graph into a directed acyclic structure that determines the order in which vertices are processed during normalization ([05]), elimination ([06]), reward transformation ([07]), and moment computation ([15]).

Phase-type graphs may contain cycles. For example, a two-state continuous-time Markov chain where states 1 and 2 communicate bidirectionally produces a cycle v_1 \to v_2 \to v_1 in the graph representation. Gaussian elimination on the graph (formalized in [06]) requires a processing order in which every vertex is eliminated before its dependencies are revisited. SCC detection identifies maximal groups of mutually reachable vertices, and the topological sort of the resulting condensation DAG provides a valid elimination order.

Phasic implements Tarjan’s algorithm (Tarjan 1972) for SCC detection and Kahn’s algorithm (Kahn 1962) for topological sorting. Both are classic algorithms with well-known correctness proofs; the contribution of this file is their precise formalization in the context of phase-type graphs and their mapping to the phasic codebase.

Prerequisites: [01], [02]

Source files: - src/c/phasic.c (functions: strongconnect2, ptd_find_strongly_connected_components, ptd_find_strongly_connected_components_acyclic, ptd_directed_graph_topological_sort, ptd_scc_graph_topological_sort, ptd_graph_is_acyclic, ptd_isolate_starting_vertex_scc) - api/c/phasic.h (structs: ptd_scc_graph, ptd_scc_vertex, ptd_scc_edge, ptd_directed_graph, ptd_directed_vertex, ptd_directed_edge)

Definitions

Strongly connected components

Definition 5.1 (Strongly connected component) Let G = (V, E, W) be a phase-type graph (Definition 2.13). A strongly connected component (SCC) of G is a maximal subset C \subseteq V such that for every pair of vertices u, v \in C, there exists a directed path from u to v and a directed path from v to u in G. The set of all SCCs partitions V: every vertex belongs to exactly one SCC.

Intuition

An SCC is a maximal group of vertices that can all reach each other via directed paths. In the context of phase-type distributions, an SCC with more than one vertex corresponds to a set of communicating states – the Markov chain can cycle among these states indefinitely before eventually leaving. A singleton SCC (a single vertex not on any cycle) corresponds to a transient state that the chain passes through at most a bounded number of times per visit.

Example

Consider a phase-type graph with vertices \{v_0, v_1, v_2, v_3, v_a\} where v_0 is the starting vertex, v_a is absorbing, and edges include v_1 \to v_2, v_2 \to v_1, v_1 \to v_3, v_3 \to v_a. The SCCs are \{v_0\}, \{v_1, v_2\}, \{v_3\}, and \{v_a\}. The set \{v_1, v_2\} is strongly connected because v_1 can reach v_2 and v_2 can reach v_1. It is maximal because adding any other vertex would break the mutual reachability property.

Condensation DAG

Definition 5.2 (Condensation DAG) Let G = (V, E, W) be a phase-type graph with SCCs C_1, \ldots, C_m. The condensation of G is the directed graph G^{\text{SCC}} = (V^{\text{SCC}}, E^{\text{SCC}}) where:

V^{\text{SCC}} = \{C_1, \ldots, C_m\} (one meta-vertex per SCC);
(C_i \to C_j) \in E^{\text{SCC}} if and only if i \neq j and there exist u \in C_i, v \in C_j with (u \to v) \in E.

Each meta-vertex C_i stores:

The set of internal vertices \{v \in V : v \in C_i\};
The set of outgoing SCC edges (C_i \to C_j) \in E^{\text{SCC}};
An index for identification.

The condensation inherits a starting meta-vertex C_0^{\text{SCC}}: the SCC containing the starting vertex v_0.

Intuition

The condensation collapses each SCC into a single node. The resulting graph is always a DAG (Theorem 5.2 below). This DAG structure is what makes topological sorting possible, and the topological order of the condensation provides the elimination order for phasic’s graph algorithms.

Example

For the graph in the example above, the condensation has four meta-vertices: \{v_0\}, \{v_1, v_2\}, \{v_3\}, \{v_a\}, with edges \{v_0\} \to \{v_1, v_2\}, \{v_1, v_2\} \to \{v_3\}, \{v_3\} \to \{v_a\}.

Definition 5.3 (Topological ordering) Let G^{\text{SCC}} = (V^{\text{SCC}}, E^{\text{SCC}}) be the condensation DAG of a phase-type graph. A topological ordering of G^{\text{SCC}} is a linear ordering C_{\pi(1)}, C_{\pi(2)}, \ldots, C_{\pi(m)} of the meta-vertices such that for every edge (C_{\pi(i)} \to C_{\pi(j)}) \in E^{\text{SCC}}, we have i < j.

Intuition

A topological ordering processes each SCC before any SCC that depends on it. In phasic, the elimination algorithm ([06]) processes vertices in the order determined by the topological sort of the condensation: vertices within each SCC are eliminated together, and SCCs are processed in topological order so that when an SCC is processed, all SCCs it depends on (i.e., its successors in the condensation DAG) have already been processed.

Example

For the condensation above, one valid topological ordering is \{v_a\}, \{v_3\}, \{v_1, v_2\}, \{v_0\}. This is also the reverse of the reachability order from v_0: absorbing states are processed first, and the starting vertex last.

Definition 5.4 (DFS discovery time and low-link) In a depth-first search (DFS) of graph G = (V, E, W), define for each vertex v \in V:

The discovery index \operatorname{idx}(v): the order in which v is first visited, starting from 0;
The low-link value \operatorname{low}(v): the smallest discovery index reachable from v via a path that follows one or more tree edges followed by at most one back edge to a vertex currently on the DFS stack.

Formally, after the DFS completes, \operatorname{low}(v) = \min(\{\operatorname{idx}(v)\} \cup \{\operatorname{low}(z) : (v \to z) \in E, z \text{ is a tree child of } v\} \cup \{\operatorname{idx}(z) : (v \to z) \in E, z \text{ is on the stack}\}).

Intuition

The low-link value tracks the “earliest” vertex (by discovery time) that v can reach by going forward in the DFS tree and then taking one back edge. If \operatorname{low}(v) = \operatorname{idx}(v), then v is the root of its SCC: no vertex in the subtree rooted at v can reach back to a vertex discovered before v.

Theorems and Proofs

Theorem 5.1 (Correctness of Tarjan’s algorithm) Algorithm 5.1 correctly identifies all strongly connected components of a phase-type graph G = (V, E, W) in O(|V| + |E|) time.

Proof. We prove correctness by establishing two properties: (1) every set of vertices popped from the stack when a root is found forms an SCC, and (2) every SCC is identified in this way.

Property (1). When a vertex v satisfies \operatorname{low}(v) = \operatorname{idx}(v) (the root condition), all vertices popped from the stack until v is popped form a strongly connected component. Let S_v denote this set. We show S_v is strongly connected. Consider any vertex w \in S_v with w \neq v. Since w is on the stack when v is being processed, w was discovered after v in the DFS and has not yet been assigned to an SCC. By the DFS tree structure, there is a path from v to w along tree edges. Since w is still on the stack and \operatorname{low}(w) \leq \operatorname{idx}(v) (otherwise w would have been popped as the root of its own SCC before v’s subtree was fully explored), there is a path from w back to some vertex with discovery index \leq \operatorname{idx}(v). Since v is the root (no vertex in S_v can reach back further than v), this path leads back to v. Hence v and w are mutually reachable, so S_v is strongly connected.

Property (2): Maximality. Suppose for contradiction that S_v is not maximal: there exists a vertex u \notin S_v that is mutually reachable with some w \in S_v. If u was already assigned to another SCC, then u’s SCC root was popped before v’s root condition was checked, meaning u was not on the stack when v’s subtree was explored. But w \in S_v can reach u, and u can reach w, so u would still be on the stack (it was discovered in v’s subtree and has a path back to v’s subtree). This contradicts u having been popped earlier.

Complexity. Each vertex is visited exactly once by the DFS (O(|V|) for vertex processing). Each edge is examined exactly once during the DFS (O(|E|) for edge processing). Each vertex is pushed onto the stack exactly once and popped exactly once (O(|V|) total). The low-link update at each edge is O(1). Total: O(|V| + |E|). \square

Theorem 5.2 (The condensation is a DAG) Let G^{\text{SCC}} = (V^{\text{SCC}}, E^{\text{SCC}}) be the condensation of a phase-type graph G. Then G^{\text{SCC}} is a directed acyclic graph.

Proof. Suppose for contradiction that G^{\text{SCC}} contains a directed cycle C_{i_1} \to C_{i_2} \to \cdots \to C_{i_k} \to C_{i_1} for k \geq 2. By definition of the condensation edges, for each pair of consecutive meta-vertices C_{i_j} and C_{i_{j+1}} in the cycle, there exist vertices u_j \in C_{i_j} and v_{j+1} \in C_{i_{j+1}} with (u_j \to v_{j+1}) \in E. Since vertices within the same SCC are mutually reachable, there is a path from v_{j+1} to u_{j+1} within C_{i_{j+1}} (for the purpose of continuing to the next edge in the cycle).

Composing these paths, we obtain a directed path from any vertex in C_{i_1} to any vertex in C_{i_2} (via the inter-SCC edge and intra-SCC paths), and similarly from C_{i_2} to C_{i_3}, and so on back to C_{i_1}. This means every vertex in C_{i_1} \cup C_{i_2} \cup \cdots \cup C_{i_k} is mutually reachable with every other vertex in this union. But this contradicts the maximality of the SCCs: C_{i_1}, \ldots, C_{i_k} are all distinct SCCs, yet their union is strongly connected, violating the requirement that each SCC is a maximal strongly connected subset. \square

Theorem 5.3 (Topological sort of a DAG) Let G^{\text{SCC}} = (V^{\text{SCC}}, E^{\text{SCC}}) be a DAG with n meta-vertices and m edges. A topological ordering of G^{\text{SCC}} exists and can be computed by Kahn’s algorithm in O(n + m) time. Moreover, the algorithm detects if the input is not a DAG (i.e., contains a cycle) by checking whether all vertices were processed.

Proof. Existence. Every finite DAG has at least one vertex with in-degree 0 (a source). If no such vertex existed, we could follow incoming edges indefinitely, which by the pigeonhole principle would produce a cycle, contradicting the DAG property. Removing a source and its outgoing edges yields a smaller DAG, to which the argument applies inductively. This induction produces a topological ordering.

Kahn’s algorithm correctness. The algorithm maintains a queue of vertices with in-degree 0. Initially, all sources are enqueued. At each step, a vertex C is dequeued and appended to the output. For each outgoing edge (C \to C'), the in-degree of C' is decremented. If C'’s in-degree reaches 0, it is enqueued. The algorithm terminates when the queue is empty.

We show the output is a valid topological ordering. Consider any edge (C_i \to C_j). When C_i is dequeued, C_j’s in-degree is decremented. C_j can only be enqueued after all its incoming edges have been processed (in-degree reaches 0), which happens at or after C_i is dequeued. Therefore C_i precedes C_j in the output.

If the graph contains a cycle, the vertices in the cycle never have their in-degree reduced to 0, so they are never enqueued. The algorithm detects this: if the number of dequeued vertices is less than n, the graph is not a DAG.

Complexity. Each meta-vertex is enqueued and dequeued at most once: O(n). Each edge is examined exactly once when its source is dequeued: O(m). The initial in-degree computation requires examining all edges: O(m). Total: O(n + m). \square

Algorithms

5.0.0.1 Tarjan’s SCC Detection

Description. Given a phase-type graph G = (V, E, W), this algorithm computes the condensation G^{\text{SCC}} by identifying all strongly connected components using Tarjan’s depth-first search algorithm. For each vertex, it maintains a discovery index and a low-link value. Vertices are pushed onto a stack when discovered and popped when an SCC root is identified (i.e., when \operatorname{low}(v) = \operatorname{idx}(v)). After all SCCs are identified, the algorithm constructs the condensation edges by scanning each SCC’s internal vertices for edges to vertices in other SCCs.

The implementation includes an optimization: if the graph is already acyclic (detected by attempting a topological sort), each vertex is trivially its own SCC, and the condensation is constructed directly without DFS. A post-processing step isolates the starting vertex v_0 into its own SCC at index 0, simplifying downstream trace stitching.

Tarjan's SCC Detection
1: Let G = (V, E, W) be a phase-type graph (@def-01-ph-graph)
2: Let idx_counter <- 0, S <- empty stack
3: Let idx[v] <- undefined, low[v] <- undefined, on_stack[v] <- false, visited[v] <- false for all v in V
4: Let components <- empty list
5:
6: function StrongConnect(v)
7:   idx[v] <- idx_counter                              > Set discovery index
8:   low[v] <- idx_counter
9:   visited[v] <- true
10:  idx_counter <- idx_counter + 1
11:  Push(S, v)
12:  on_stack[v] <- true
13:
14:  for (v -> z) in E(v) do                              > Examine all outgoing edges
15:    if not visited[z] then
16:      StrongConnect(z)                                > Recurse on unvisited successor
17:      low[v] <- min(low[v], low[z])
18:    else if on_stack[z] then
19:      low[v] <- min(low[v], idx[z])                   > Back edge to vertex on stack
20:    end if
21:  end for
22:
23:  if low[v] = idx[v] then                             > v is root of an SCC
24:    C <- empty set
25:    repeat
26:      w <- Pop(S)
27:      on_stack[w] <- false
28:      C <- C U {w}
29:    until w = v
30:    Append(components, C)
31:  end if
32: end function
33:
34: function FindSCCs(G)
35:  for v in V do
36:    if not visited[v] then
37:      StrongConnect(v)
38:    end if
39:  end for
40:                                                       > Build condensation edges
41:  G^SCC <- empty condensation graph
42:  for each SCC C_i in components do
43:    for v in C_i do
44:      for (v -> z) in E(v) do
45:        C_j <- SCC containing z
46:        if C_j != C_i then
47:          Add edge (C_i -> C_j) to G^SCC if not present > Deduplicated via AVL tree
48:        end if
49:      end for
50:    end for
51:  end for
52:  IsolateStartingVertex(G^SCC)                        > Ensure v_0 is alone in SCC 0
53:  return G^SCC
54: end function

Correspondence table:

Pseudocode variable	Math symbol	Code variable (file:function)
G	G = (V, E, W)	`ptd_graph*` (phasic.c:`ptd_find_strongly_connected_components`)
v	v \in V	`vertex` (phasic.c:`strongconnect2`)
\operatorname{idx\_counter}	—	`state.scc_index`
S	—	`state.scc_stack`
\operatorname{idx}[v]	\operatorname{idx}(v)	`state.scc_indices[vertex->index]`
\operatorname{low}[v]	\operatorname{low}(v)	`state.low_links[vertex->index]`
\operatorname{on\_stack}[v]	—	`state.scc_on_stack[vertex->index]`
\operatorname{visited}[v]	—	`state.visited[vertex->index]`
\operatorname{components}	\{C_1, \ldots, C_m\}	`state.scc_components`
C	C_i	`ptd_scc_vertex*` with `internal_vertices`
G^{\text{SCC}}	G^{\text{SCC}} = (V^{\text{SCC}}, E^{\text{SCC}})	`ptd_scc_graph*`

Complexity. Time: O(|V| + |E|) for SCC detection (by Theorem 5.1), plus O(|E| \log \Delta^{\text{SCC}}) for condensation edge construction (where \Delta^{\text{SCC}} is the maximum number of distinct target SCCs from any single SCC), due to AVL tree deduplication. The edge sorting adds O(|E^{\text{SCC}}| \log |E^{\text{SCC}}|) per SCC. Total: O(|V| + |E| \log |V|) in the worst case. Space: O(|V|) for the stack, index arrays, and low-link arrays.

Algorithm 5.1: Correctness. Follows from Theorem 5.1 (SCC detection) and Theorem 5.2 (the condensation is a DAG). The deduplication of condensation edges via the AVL tree ensures that parallel edges in G between the same pair of SCCs produce exactly one condensation edge.

5.0.0.2 Topological Sort of Condensation DAG

Description. Given the condensation G^{\text{SCC}} (which is a DAG by Theorem 5.2), this algorithm produces a topological ordering of the meta-vertices using Kahn’s algorithm. It initializes an in-degree counter for each meta-vertex, enqueues all sources (in-degree 0), and repeatedly dequeues a meta-vertex, appends it to the output, and decrements the in-degree of its successors.

The implementation operates on the generic ptd_directed_graph struct. Both ptd_graph and ptd_scc_graph are layout-compatible with ptd_directed_graph (their first three fields – vertices_length, vertices, starting_vertex – match), allowing the same function to sort either the original graph (when acyclic) or the condensation.

Topological Sort of Condensation DAG
1: Let G^SCC = (V^SCC, E^SCC) be the condensation DAG (@def-04-condensation)
2: Let n <- |V^SCC|
3:
4: function TopologicalSort(G^SCC)
5:   in_deg[C] <- 0 for all C in V^SCC
6:   for C in V^SCC do                                   > Compute in-degrees
7:     for (C -> C') in E^SCC do
8:       in_deg[C'] <- in_deg[C'] + 1
9:     end for
10:  end for
11:
12:  Q <- empty queue
13:  Enqueue(Q, V^SCC[0])                                > Start with first vertex (starting SCC)
14:  result <- empty array of size n
15:  k <- 0
16:
17:  while Q is not empty do
18:    C <- Dequeue(Q)
19:    result[k] <- C
20:    k <- k + 1
21:    for (C -> C') in E^SCC do
22:      in_deg[C'] <- in_deg[C'] - 1
23:      if in_deg[C'] = 0 then
24:        Enqueue(Q, C')
25:      end if
26:    end for
27:    if Q is empty and k < n then                      > Handle disconnected sources
28:      for C in V^SCC do
29:        if in_deg[C] = 0 and C not yet visited then
30:          Enqueue(Q, C)
31:        end if
32:      end for
33:    end if
34:  end while
35:
36:  if k < n then                                       > Cycle detection
37:    return null                                        > Graph is not a DAG
38:  end if
39:  return result
40: end function

Correspondence table:

Pseudocode variable	Math symbol	Code variable (file:function)
G^{\text{SCC}}	G^{\text{SCC}} = (V^{\text{SCC}}, E^{\text{SCC}})	`ptd_directed_graph*` (phasic.c:`ptd_directed_graph_topological_sort`)
n	\|V^{\text{SCC}}\|	`graph->vertices_length`
\operatorname{in\_deg}[C]	—	`nparents[C->index]`
Q	—	`q` (queue)
\operatorname{result}	C_{\pi(1)}, \ldots, C_{\pi(m)}	`res`
k	—	`topo_index`
C	C_i \in V^{\text{SCC}}	`vertex`

Complexity. Time: O(|V^{\text{SCC}}| + |E^{\text{SCC}}|) (by Theorem 5.3). Space: O(|V^{\text{SCC}}|) for the in-degree array, visited array, and output array.

Algorithm 5.2: Correctness. Follows from Theorem 5.3. The algorithm correctly produces a topological ordering when the input is a DAG, and returns null when the input contains a cycle (detected by k < n). The disconnected-source handling (lines 27–33) ensures that vertices with no incoming edges that are not reachable from the starting vertex are still processed; this is a one-time scan triggered only when the queue empties prematurely.

Implementation Notes

Source code mapping:

Algorithm	File	Function	Lines
Algorithm 5.1 (SCC core)	`src/c/phasic.c`	`strongconnect2`	L1830–L1895
Algorithm 5.1 (main)	`src/c/phasic.c`	`ptd_find_strongly_connected_components`	L2143–L2313
Algorithm 5.1 (acyclic fast path)	`src/c/phasic.c`	`ptd_find_strongly_connected_components_acyclic`	L2113–L2141
Algorithm 5.1 (starting vertex isolation)	`src/c/phasic.c`	`ptd_isolate_starting_vertex_scc`	L1943–L2111
Algorithm 5.2	`src/c/phasic.c`	`ptd_directed_graph_topological_sort`	L3320–L3390
Algorithm 5.2 (SCC wrapper)	`src/c/phasic.c`	`ptd_scc_graph_topological_sort`	L3406–L3408

Deviations from pseudocode:

Algorithm 5.1 (acyclic fast path): The implementation first checks ptd_graph_is_acyclic(graph) (which itself attempts a topological sort of the original graph). If the graph is acyclic, it bypasses Tarjan’s DFS entirely and constructs a trivial condensation where each vertex is its own SCC. This is an optimization: acyclic graphs are common in practice (e.g., Erlang distributions), and the trivial condensation avoids the overhead of DFS state allocation.
Algorithm 5.1 (memory over-allocation): The arrays scc_indices, low_links, scc_on_stack, and visited are allocated with capacity vertices_length * 10 instead of vertices_length. This appears to be a safety margin for graphs that grow during processing (e.g., when DPH normalization adds auxiliary vertices before SCC detection). However, the SCC algorithm does not modify the graph, so the factor of 10 is unnecessarily large.
Algorithm 5.1 (starting vertex isolation): After SCC detection, ptd_isolate_starting_vertex_scc ensures the starting vertex v_0 is alone in SCC 0. If v_0 shares an SCC with other vertices, it is extracted into a new singleton SCC. This simplifies trace stitching ([13]) by guaranteeing a fixed structure for the starting SCC. The pseudocode abstracts this as a single call to IsolateStartingVertex.
Algorithm 5.1 (condensation edge deduplication): The implementation uses an AVL tree (Definition 3.3) to deduplicate target SCCs for each source SCC, then converts the tree to an array via DFS traversal and sorts the resulting edges by target SCC index. The pseudocode abstracts this as “add edge if not present.”
Algorithm 5.1 (empty SCC filtering): After Tarjan’s algorithm, SCCs with zero internal vertices are discarded (lines 2182–2213). This handles a corner case where the algorithm produces empty component entries; in a correct execution this should not occur, but the implementation is defensive.
Algorithm 5.2 (struct casting): The topological sort is implemented on the generic ptd_directed_graph struct. Both ptd_graph and ptd_scc_graph are cast to ptd_directed_graph* because their memory layouts are compatible in the first three fields (vertices_length, vertices, starting_vertex). Similarly, ptd_scc_vertex and ptd_vertex are cast to ptd_directed_vertex. This relies on C struct layout guarantees for the initial members.
Algorithm 5.2 (disconnected source handling): The implementation includes a has_pushed_all_others flag that triggers a one-time scan for unreachable vertices with in-degree 0 when the queue first empties. This handles graphs with disconnected components (which can arise when absorbing vertices have no incoming edges from the main component).

Remark (SUSPECTED_CODE_ISSUE: SCC array over-allocation). The allocation calloc(graph->vertices_length * 10, sizeof(size_t)) at line 2158 for scc_indices (and similarly for low_links, scc_on_stack, visited) allocates 10 times more memory than needed. The arrays are indexed by vertex->index, which ranges from 0 to vertices_length - 1. The factor of 10 wastes memory without correctness benefit, since the graph is not modified during SCC detection.

Symbol Index

Symbol	Name	First appearance
C	Strongly connected component	Definition 5.1
C_0^{\text{SCC}}	Starting meta-vertex in condensation DAG	Definition 5.2
E^{\text{SCC}}	Edge set of condensation DAG	Definition 5.2
G^{\text{SCC}}	Condensation DAG	Definition 5.2
\operatorname{idx}(v)	DFS discovery index of vertex v	Definition 5.4
\operatorname{low}(v)	Low-link value of vertex v	Definition 5.4
\pi	Permutation (topological ordering)	Definition 5.3
V^{\text{SCC}}	Vertex set of condensation DAG (set of SCCs)	Definition 5.2