4 Numerical Primitives
Introduction
This file documents the numerical building blocks used by phasic’s core algorithms to maintain accuracy in floating-point computation. Two primitives are formalized: Kahan compensated summation and log-space weight evaluation. These are not algorithms in the usual sense of solving a graph problem; they are low-level arithmetic routines that are embedded within larger algorithms.
Kahan summation is used in sojourn time computation and PMF gradient computation, where long summation chains in double-precision arithmetic would otherwise lose significant digits. Log-space weight evaluation is used when edge weights are products of parameter-coefficient pairs (the log weight mode from Definition 2.15), where direct multiplication could overflow or underflow.
Prerequisites: [01], [02]
Source files: - src/c/phasic.c (functions: ptd_expected_sojourn_time_subset, compute_pmf_with_gradient, ptd_graph_update_weights)
Definitions
Compensated summation
When summing n floating-point numbers in the obvious way (s \leftarrow s + x_i), each addition may introduce a rounding error of magnitude up to \varepsilon |s + x_i|, where \varepsilon is the machine epsilon. Over n additions, these errors can accumulate to O(n\varepsilon) relative error. Kahan’s algorithm tracks a running compensation term that captures the rounding error from each addition and feeds it back into the next step.
Definition 4.1 (Kahan compensated summation) A Kahan accumulator (Kahan 1965) is a pair (S, C) where S is the running sum and C is the compensation term, both initialized to zero. Given a sequence of values x_1, x_2, \ldots, x_n, the compensated sum is computed by the following recurrence: for i = 1, \ldots, n,
y_i = x_i - C, \tag{4.1} t_i = S + y_i, \tag{4.2} C \leftarrow (t_i - S) - y_i, \tag{4.3} S \leftarrow t_i. \tag{4.4}
The result is S after processing all n values.
Intuition
In exact arithmetic, Equation 4.3 yields C = 0 always, because (t_i - S) - y_i = (S + y_i - S) - y_i = 0. In floating-point arithmetic, \operatorname{fl}(S + y_i) may differ from S + y_i; the discrepancy is exactly what (t_i - S) - y_i computes (to first order in \varepsilon). By subtracting C from the next input (Equation 4.1), the lost low-order bits are recovered.Example
Let \varepsilon \approx 1.1 \times 10^{-16} (IEEE 754 double precision). Summing 10^8 values of 1.0: naive summation may accumulate relative error of order 10^8 \varepsilon \approx 10^{-8}, yielding a result like 10^8 \pm 1. Kahan summation maintains relative error of order \varepsilon, yielding 10^8 exactly (for this case).Definition 4.2 (Kahan accumulator struct) In the implementation, the Kahan accumulator is a C struct with two fields: sum (= S) and compensation (= C). Three operations are defined:
- Init: S \leftarrow 0, \; C \leftarrow 0.
- Add(x): execute Equation 4.1–Equation 4.4.
- Result: return S.
Intuition
The struct encapsulates the two-variable state, making it a drop-in replacement for a plain double accumulator in existing code.Log-space weight computation
When the weight mode is log (Definition 2.15), the edge weight is a product:
w(v \to z) = \prod_{j=1}^{k} (c_j \theta_j).
Direct computation of this product risks overflow (if the factors are large) or underflow (if the factors are small). Log-space computation avoids both issues (Higham 2002).
Definition 4.3 (Log-space weight computation) Let \mathbf{c} = (c_1, \ldots, c_k) be the coefficient vector and \boldsymbol{\theta} = (\theta_1, \ldots, \theta_k) be the parameter vector, with c_j \theta_j > 0 for all j. The log-space weight is
w(v \to z) = \exp\!\left(\sum_{j=1}^{k} \log(c_j \theta_j)\right). \tag{4.5}
The computation proceeds in two steps: 1. Accumulate: \ell_{\log} \leftarrow \sum_{j=1}^{k} \log(c_j \theta_j). 2. Exponentiate: w \leftarrow \exp(L).
Intuition
Working in log-space converts the product into a sum, and the sum of logarithms stays in a moderate range even when the individual factors span many orders of magnitude. The final exponentiation maps back to the original scale. This is the standard technique for computing products in numerical work.Example
Suppose k = 3 with c_j \theta_j = (10^{100}, 10^{-50}, 10^{-50}). The product is 10^{100} \cdot 10^{-50} \cdot 10^{-50} = 1. Direct multiplication would compute 10^{100} \cdot 10^{-50} = 10^{50} at an intermediate step, which is fine in this case, but with more extreme values or longer products, intermediate overflow is a real risk. In log-space: \log(10^{100}) + \log(10^{-50}) + \log(10^{-50}) = 100 \ln 10 - 50 \ln 10 - 50 \ln 10 = 0, and \exp(0) = 1.Definition 4.4 (Strict positivity guard) The log-space weight computation (Definition 4.3) requires the strict positivity condition c_j \theta_j > 0 for all j = 1, \ldots, k. If any c_j \theta_j \leq 0, the logarithm is undefined (for c_j \theta_j \leq 0) or -\infty (for c_j \theta_j = 0), and the computation must not proceed. The implementation raises an error if this condition is violated.
Intuition
This is a fundamental constraint of the log-space approach. Models using thelog weight mode must ensure all coefficient-parameter products are strictly positive. This is naturally satisfied in many applications (e.g., population genetics where rates are positive by definition).
Theorems and Proofs
Theorem 4.1 (Kahan summation error bound) Let x_1, \ldots, x_n be floating-point numbers and let \hat{S}_n be the result of Kahan compensated summation (Definition 4.1). Let S_n = \sum_{i=1}^{n} x_i be the exact sum. Then
|\hat{S}_n - S_n| \leq (2\varepsilon + O(\varepsilon^2)) \sum_{i=1}^{n} |x_i|, \tag{4.6}
where \varepsilon is the unit roundoff (machine epsilon) (Higham 2002). In contrast, naive left-to-right summation satisfies only |\hat{S}_n^{\text{naive}} - S_n| \leq n\varepsilon \sum_{i=1}^{n} |x_i| + O(\varepsilon^2).
Proof. We analyze one step of the recurrence. Let \operatorname{fl}(\cdot) denote the result of a floating-point operation. The IEEE 754 model guarantees \operatorname{fl}(a \oplus b) = (a + b)(1 + \delta) with |\delta| \leq \varepsilon for each elementary operation \oplus \in \{+, -\}.
At step i, the compensation C contains (approximately) the rounding error from the previous addition. Consider the computed values:
- \hat{y}_i = \operatorname{fl}(x_i - C) = (x_i - C)(1 + \delta_1) where |\delta_1| \leq \varepsilon.
- \hat{t}_i = \operatorname{fl}(S + \hat{y}_i) = (S + \hat{y}_i)(1 + \delta_2) where |\delta_2| \leq \varepsilon.
- \hat{C}' = \operatorname{fl}(\operatorname{fl}(\hat{t}_i - S) - \hat{y}_i).
The key insight is that \hat{t}_i - S recovers the high-order part of \hat{y}_i (because \hat{t}_i = \operatorname{fl}(S + \hat{y}_i), and if |S| \gg |\hat{y}_i|, then \hat{t}_i - S captures only the portion of \hat{y}_i that “fit” into the sum). Subtracting \hat{y}_i then gives the portion that was lost: \hat{C}' \approx (\hat{t}_i - S) - \hat{y}_i = -\delta_2(S + \hat{y}_i), which is precisely the rounding error from computing S + \hat{y}_i.
By feeding \hat{C}' back into the next step (subtracting it from x_{i+1}), the error from step i is corrected in step i+1, up to a second-order term in \varepsilon. Formally, one can show by induction that after n steps, the accumulated error satisfies:
|\hat{S}_n - S_n| \leq 2\varepsilon \sum_{i=1}^{n} |x_i| + O(n\varepsilon^2) \sum_{i=1}^{n} |x_i|.
For practical values of n (even n = 10^8), the O(n\varepsilon^2) \approx 10^8 \cdot 10^{-32} = 10^{-24} term is negligible compared to the 2\varepsilon \approx 2 \times 10^{-16} term. The error bound is therefore effectively O(\varepsilon), independent of n.
For naive summation, each of the n additions contributes an error of up to \varepsilon |s + x_i|, and these errors accumulate without correction, yielding the O(n\varepsilon) bound. \square
Theorem 4.2 (Log-space numerical stability for multiplicative weights) Let c_j \theta_j > 0 for all j = 1, \ldots, k, and let w = \prod_{j=1}^{k} (c_j \theta_j) be the true weight. Let \hat{w} be the result of the log-space computation (Definition 4.3). Then the relative error satisfies
\left|\frac{\hat{w} - w}{w}\right| \leq \exp(k\varepsilon) - 1 + \varepsilon \approx (k+1)\varepsilon \tag{4.7}
for small k\varepsilon. In particular, the relative error grows linearly with k (the number of factors) and is independent of the magnitudes of the factors.
Proof. Step 1 (logarithms): Each \log(c_j \theta_j) is computed with relative error at most \varepsilon, so \operatorname{fl}(\log(c_j \theta_j)) = \log(c_j \theta_j)(1 + \delta_j) with |\delta_j| \leq \varepsilon. This gives an absolute error of |\delta_j \log(c_j \theta_j)| per term.
Step 2 (summation): The sum \ell_{\log} = \sum_{j=1}^{k} \log(c_j \theta_j) is computed by naive left-to-right addition (the implementation does not use Kahan summation for this inner sum, which is acceptable since k is typically small). The accumulated absolute error in \hat{\ell}_{\log} satisfies:
|\hat{L} - L| \leq \sum_{j=1}^{k} |\delta_j \log(c_j \theta_j)| + (k-1)\varepsilon \max_i |\hat{L}_i| \leq k\varepsilon (|\log(c_1\theta_1)| + \cdots + |\log(c_k\theta_k)|) + O(k\varepsilon |L|),
which simplifies to |\hat{L} - L| \leq k\varepsilon |L| + O(k\varepsilon) when all terms have the same sign.
Step 3 (exponentiation): \hat{w} = \operatorname{fl}(\exp(\hat{L})) = \exp(\hat{L})(1 + \delta_{\exp}) with |\delta_{\exp}| \leq \varepsilon. Thus:
\hat{w} = \exp(L + (\hat{L} - L))(1 + \delta_{\exp}) = w \cdot \exp(\hat{L} - L) \cdot (1 + \delta_{\exp}).
The relative error is:
\left|\frac{\hat{w}}{w} - 1\right| = |\exp(\hat{L} - L)(1 + \delta_{\exp}) - 1| \leq \exp(|\hat{L} - L|)(1 + \varepsilon) - 1.
With |\hat{L} - L| \leq k\varepsilon |L| + O(k\varepsilon) and using \exp(x) \leq 1 + 2x for small x, the relative error is bounded by (k+1)\varepsilon to first order.
Crucially, this bound depends only on k and \varepsilon, not on the magnitudes of c_j \theta_j. Direct multiplication would suffer from intermediate overflow or underflow when factors span many orders of magnitude, a failure mode that log-space computation eliminates entirely. \square
Algorithms
4.0.0.1 Kahan Compensated Summation
Description. This algorithm computes the sum of a sequence of floating-point numbers with O(\varepsilon) error, independent of the number of terms. It maintains a running sum S and a compensation term C that tracks accumulated rounding errors.
Kahan Compensated Summation
1: Let x_1, ..., x_n be floating-point numbers
2:
3: function KahanSum(x_1, ..., x_n)
4: S ← 0.0
5: C ← 0.0
6: for i ← 1 to n do
7: y ← x_i - C ▷ Compensate for previous rounding error
8: t ← S + y ▷ S is large, y is small; low-order bits of y may be lost
9: C ← (t - S) - y ▷ (t - S) recovers high-order part of y; subtracting y gives the lost part
10: S ← t
11: end for
12: return S
13: end function
Correspondence table:
| Pseudocode variable | Math symbol | Code variable (file:function) |
|---|---|---|
| S | S (running sum) | kahan.sum (phasic.c: Kahan struct) |
| C | C (compensation) | kahan.compensation (phasic.c: Kahan struct) |
| y | y_i | local variable y |
| t | t_i | local variable t |
| x_i | x_i | input values |
Complexity. Time: O(n). Space: O(1) (two extra variables beyond the input).
4.0.0.2 Log-Space Weight Evaluation
Description. Given a coefficient vector \mathbf{c} and parameter vector \boldsymbol{\theta}, this algorithm computes the multiplicative weight w = \prod_j (c_j \theta_j) in log-space to avoid overflow and underflow.
Log-Space Weight Evaluation
1: Let c = (c_1, ..., c_k) be the coefficient vector and θ = (θ_1, ..., θ_k) be the parameter vector
2: Require: c_j θ_j > 0 for all j = 1, ..., k
3:
4: function LogSpaceWeight(c, θ)
5: L ← 0.0
6: for j ← 1 to k do
7: if c_j · θ_j ≤ 0 then
8: error("Non-positive factor in log-space weight") ▷ Strict positivity guard (@def-03-positivity-guard)
9: end if
10: L ← L + log(c_j · θ_j)
11: end for
12: return exp(L)
13: end function
Correspondence table:
| Pseudocode variable | Math symbol | Code variable (file:function) |
|---|---|---|
| \mathbf{c} | \mathbf{c} (coefficient vector) | edge->coefficients (phasic.c:ptd_graph_update_weights) |
| \boldsymbol{\theta} | \boldsymbol{\theta} (parameter vector) | params array |
| \ell_{\log} | \sum_j \log(c_j \theta_j) | log_sum local variable |
| k | k (parameter dimension) | n_params or loop bound |
| w | w(v \to z) | edge->weight (updated in place) |
Complexity. Time: O(k) per edge. Space: O(1).
Numerical Considerations
Kahan summation scope. Kahan summation is used in two specific contexts in phasic:
Sojourn time computation (
ptd_expected_sojourn_time_subset): The expected sojourn time is a sum over many vertex contributions, where individual terms can span several orders of magnitude. Kahan summation ensures the small contributions are not lost.PMF gradient computation (
compute_pmf_with_gradient): The gradient of the forward algorithm involves long summation chains over time steps. Without compensation, the accumulated error would corrupt the gradient and cause optimization algorithms (SVGD, HMC) to receive incorrect search directions.
Kahan summation is not used in the standard forward algorithm for PDF/PMF computation. Instead, the forward algorithm uses long double (80-bit extended precision on x86, 128-bit on some platforms) for its intermediate accumulators. This provides \varepsilon \approx 10^{-19} rather than \varepsilon \approx 10^{-16}, which is sufficient for the forward algorithm’s summation chains without the overhead of Kahan’s compensation logic.
Log-space inner summation. The log-space weight evaluation (Algorithm 4.2) does not use Kahan summation for the accumulation of \log(c_j \theta_j) terms. This is acceptable because k (the number of parameters per edge) is typically small (1-5 in practice), so the naive summation error O(k\varepsilon) is negligible. If a future application required very large k, Kahan summation could be added to the inner loop of Algorithm 4.2 at minimal cost.
Guard condition for log mode. The strict positivity guard (Definition 4.4) is checked at the point of weight computation, not at graph construction time. This means a graph can be constructed with arbitrary coefficients and later fail at weight evaluation if the parameter vector causes some c_j \theta_j \leq 0. This is by design: the coefficients are fixed at construction, but the parameters change during inference.
Implementation Notes
Source code mapping:
| Algorithm | File | Function | Notes |
|---|---|---|---|
| Algorithm 4.1 | src/c/phasic.c |
Kahan struct and macros | Used in ptd_expected_sojourn_time_subset, compute_pmf_with_gradient |
| Algorithm 4.2 | src/c/phasic.c |
ptd_graph_update_weights (with use_log=true) |
Log-space branch of weight update |
Deviations from pseudocode:
Algorithm 4: The implementation uses a C struct
{ double sum; double compensation; }with inline functions or macros forinit,add, andresult. The pseudocode shows the recurrence as a loop; in practice, theaddoperation is called incrementally from within larger algorithms.Algorithm 5: The implementation is embedded in
ptd_graph_update_weights, which iterates over all edges in the graph and updates each edge’s weight based on the current parameter vector and weight mode. The log-space branch is selected by a boolean flaguse_log. The pseudocode shows a single edge computation; the actual function processes all edges in a single pass.Forward algorithm accumulation: The standard forward algorithm (
ptd_graph_pdf,ptd_dph_pmf) useslong doublefor intermediate sums rather than Kahan summation. This is a pragmatic choice:long doubleprovides sufficient precision for the forward algorithm’s needs without the two-variable overhead of Kahan accumulators, and the code is simpler.
Symbol Index
| Symbol | Name | First appearance |
|---|---|---|
| C | Kahan compensation term | Definition 4.1 |
| \varepsilon | Machine epsilon (unit roundoff) | Theorem 4.1 |
| \ell_{\log} | Log-space accumulator | Definition 4.3 |
| S | Kahan running sum | Definition 4.1 |
| t_i | Kahan temporary sum | Definition 4.1 |
| y_i | Kahan compensated input | Definition 4.1 |