25  Mixed-Radix State Indexing

Introduction

Many phase-type models arising in population genetics encode structured states: a lineage may carry several discrete properties (population label, allele count, number of descendants) that together define the Markov chain state. Rather than working with arbitrary state vectors \sigma \in \mathbb{Z}^d, it is advantageous to define each property as a bounded integer with known range, and to establish a bijection between the combinatorial product of these ranges and a contiguous linear index. This bijection enables efficient graph construction (Algorithm 10.1) while allowing the user to think in terms of named properties rather than raw indices.

This file formalizes the mixed-radix numbering system (Cormen et al. 2009) used by the StateIndexer, Property, and PropertySet classes. The system assigns each property a “digit” in a positional number system where different digits may have different bases (radices). Conversion between the linear index and the property tuple is O(d) in the number of properties, with no table lookups or hash maps required.

Prerequisites: [01], [02]

Source files:

  • src/phasic/state_indexing.py (classes: Property, PropertySet, StateIndexer, Slot)

Definitions

Definition 25.1 (Property) A property is a triple P = (\mathrm{name}, a, b) where \mathrm{name} is a string identifier, a \in \mathbb{Z} is the minimum value, and b \in \mathbb{Z} with b \geq a is the maximum value. The base (radix) of the property is \beta(P) = b - a + 1, and the set of valid values is \mathcal{V}(P) = \{a, a+1, \ldots, b\}.

Intuition A property represents one dimension of a structured state. For instance, a “population” property with a = 1, b = 3 models three populations, while a “descendants” property with a = 0, b = 10 tracks lineage size up to 10. The base \beta(P) is the number of distinct values.
Example Let P_1 = (\texttt{descendants}, 0, 10) and P_2 = (\texttt{population}, 1, 3). Then \beta(P_1) = 11 and \beta(P_2) = 3, so the total number of combinations is 11 \times 3 = 33.

Definition 25.2 (PropertySet) A property set is a pair \mathcal{P} = (\mathrm{name}, (P_0, P_1, \ldots, P_{d-1})) where \mathrm{name} is a string identifier and (P_0, \ldots, P_{d-1}) is an ordered tuple of d properties with pairwise distinct names. The state length of \mathcal{P} is

N(\mathcal{P}) = \prod_{j=0}^{d-1} \beta(P_j).

The state space of \mathcal{P} is the Cartesian product \mathcal{V}(P_0) \times \mathcal{V}(P_1) \times \cdots \times \mathcal{V}(P_{d-1}), which has cardinality N(\mathcal{P}).

Intuition A PropertySet groups properties into a single combinatorial space. The ordering of properties matters: it determines which property is the “least significant digit” in the mixed-radix representation. Property P_0 varies fastest as the linear index increases.
Example With P_0 = (\texttt{descendants}, 0, 2) (\beta = 3) and P_1 = (\texttt{population}, 1, 2) (\beta = 2), the state length is N = 3 \times 2 = 6. The six states are: (0,1), (1,1), (2,1), (0,2), (1,2), (2,2) at indices 0, 1, 2, 3, 4, 5 respectively.

Definition 25.3 (Mixed-Radix Index) Let \mathcal{P} = (\mathrm{name}, (P_0, \ldots, P_{d-1})) be a property set. The mixed-radix encoding is the bijection

\phi \colon \mathcal{V}(P_0) \times \cdots \times \mathcal{V}(P_{d-1}) \to \{0, 1, \ldots, N(\mathcal{P}) - 1\}

defined by

\phi(v_0, v_1, \ldots, v_{d-1}) = \sum_{j=0}^{d-1} (v_j - a_j) \cdot \pi_j \tag{25.1}

where a_j is the minimum value of P_j and the radix powers are

\pi_0 = 1, \qquad \pi_j = \prod_{i=0}^{j-1} \beta(P_i) \quad \text{for } j \geq 1. \tag{25.2}

The inverse mapping \phi^{-1} recovers the property values from a linear index n:

v_j = \left\lfloor \frac{n}{\pi_j} \right\rfloor \bmod \beta(P_j) + a_j, \qquad j = 0, \ldots, d-1. \tag{25.3}

Intuition This is the generalization of standard positional notation to bases that vary by digit position. In decimal (base 10 everywhere), the number 542 means 2 \cdot 1 + 4 \cdot 10 + 5 \cdot 100. Here, each property acts as a digit but may have a different base. Property P_0 is the “ones digit” (varies fastest), P_1 is the next digit, and so on.
Example Continuing the example from Definition 25.2: \pi_0 = 1, \pi_1 = 3. For state (\texttt{descendants}=2, \texttt{population}=2): \phi = (2-0) \cdot 1 + (2-1) \cdot 3 = 5. Inverse: given n = 5, we get v_0 = (5 \mathbin{/} 1) \bmod 3 + 0 = 2 and v_1 = (5 \mathbin{/} 3) \bmod 2 + 1 = 2.

Definition 25.4 (StateIndexer) A StateIndexer is a tuple \mathcal{I} = ((\mathcal{P}_1, \ldots, \mathcal{P}_k), (\mathrm{slot}_1, \ldots, \mathrm{slot}_m)) consisting of k \geq 0 property sets and m \geq 0 slots (with k + m \geq 1), where each slot occupies exactly one index position. The total state length is

N(\mathcal{I}) = \sum_{i=1}^{k} N(\mathcal{P}_i) + m.

The index space is partitioned into contiguous blocks: property set \mathcal{P}_i occupies indices [o_i, o_i + N(\mathcal{P}_i)) where o_1 = 0 and o_{i+1} = o_i + N(\mathcal{P}_i), followed by slot indices o_{k+1}, o_{k+1}+1, \ldots, o_{k+1}+m-1.

Intuition A StateIndexer combines multiple independent state spaces and metadata slots into a single flat index space. This is useful when a model has logically distinct groups of properties – for example, lineage properties and timing metadata – that should be managed independently but stored in a single state vector.
Example Let \mathcal{P}_1 have state length 11 (descendants 0–10) and \mathcal{P}_2 have state length 101 (time bins 0–100), with one slot “epoch”. Then N(\mathcal{I}) = 11 + 101 + 1 = 113. The lineage indices are 0–10, metadata indices are 11–111, and the epoch slot is index 112.

Theorems and Proofs

Theorem 25.1 (Mixed-Radix Bijection Correctness) Let \mathcal{P} = (\mathrm{name}, (P_0, \ldots, P_{d-1})) be a property set. The mixed-radix encoding \phi (Definition 25.3) is a bijection from \mathcal{V}(P_0) \times \cdots \times \mathcal{V}(P_{d-1}) to \{0, 1, \ldots, N(\mathcal{P})-1\}, and both \phi and \phi^{-1} can be computed in O(d) time.

Proof. We proceed in three parts.

Well-definedness and range. For any (v_0, \ldots, v_{d-1}) with v_j \in \mathcal{V}(P_j), each encoded digit e_j = v_j - a_j satisfies 0 \leq e_j \leq \beta(P_j) - 1. The value \phi = \sum_{j=0}^{d-1} e_j \pi_j satisfies:

0 \leq \phi \leq \sum_{j=0}^{d-1} (\beta(P_j) - 1) \pi_j = \prod_{j=0}^{d-1} \beta(P_j) - 1 = N(\mathcal{P}) - 1

where the equality \sum_{j=0}^{d-1} (\beta(P_j) - 1) \pi_j = \prod_{j=0}^{d-1} \beta(P_j) - 1 follows by the standard telescoping identity for mixed-radix systems (induction on d: the base case d=1 gives (\beta(P_0)-1) \cdot 1 = \beta(P_0) - 1; for the inductive step, \sum_{j=0}^{d-1} (\beta(P_j)-1)\pi_j = (\beta(P_{d-1})-1)\pi_{d-1} + \sum_{j=0}^{d-2}(\beta(P_j)-1)\pi_j = (\beta(P_{d-1})-1)\pi_{d-1} + \pi_{d-1} - 1 = \beta(P_{d-1})\pi_{d-1} - 1 = N(\mathcal{P}) - 1).

Injectivity. Suppose \phi(v_0, \ldots, v_{d-1}) = \phi(v_0', \ldots, v_{d-1}'). Then \sum_j (v_j - a_j)\pi_j = \sum_j (v_j' - a_j)\pi_j. Since 0 \leq v_j - a_j < \beta(P_j), the representation of any integer in the range [0, N-1) in the mixed-radix system with bases (\beta(P_0), \ldots, \beta(P_{d-1})) is unique (by the division algorithm applied iteratively). Hence v_j = v_j' for all j.

Inverse correctness. Given n = \sum_j e_j \pi_j with 0 \leq e_j < \beta(P_j), we recover e_j = \lfloor n / \pi_j \rfloor \bmod \beta(P_j) by the standard mixed-radix digit extraction. Adding back a_j gives v_j.

Complexity. Computing \phi requires d multiplications and additions. Computing \phi^{-1} requires d integer divisions and modular reductions. The radix powers \pi_j are precomputed once in O(d) time and reused. \square

Algorithms

25.0.0.1 IndexToProperties

Description. Given a linear index n and a property set \mathcal{P}, this algorithm computes the inverse mixed-radix mapping \phi^{-1}(n), returning the property values (v_0, \ldots, v_{d-1}). It uses the precomputed radix powers \pi_j to extract each digit via integer division and modular reduction.

IndexToProperties
1: Let P = (P₀, ..., P_{d-1}) be properties with bases β(P_j) and min values a_j
2: Let π₀ = 1, π_j = ∏_{i=0}^{j-1} β(Pᵢ) be precomputed radix powers
3: Let n ∈ {0, ..., N(P)-1} be the linear index

4: function IndexToProperties(n, P, π)
5:   for j ← 0 to d-1 do
6:     e_j ← ⌊n / π_j⌋ mod β(P_j)             ▷ Extract encoded digit
7:     v_j ← e_j + a_j                           ▷ Decode to property value
8:   end for
9:   return (v₀, v₁, ..., v_{d-1})
10: end function

Correspondence table:

Pseudocode variable Math symbol Code variable (file:function)
n index n \in \{0, \ldots, N-1\} index (state_indexing.py:PropertySet.index_to_props)
P_j property P_j self.properties[j] (state_indexing.py:PropertySet)
\beta(P_j) base of property j prop.base (state_indexing.py:Property.base)
a_j minimum value of P_j prop.min_value (state_indexing.py:Property)
\pi_j radix power self._radix_powers[j] (state_indexing.py:PropertySet)
e_j encoded digit encoded[i] (state_indexing.py:PropertySet.index_to_props)
v_j decoded property value decoded_values[prop.name] (state_indexing.py:PropertySet.index_to_props)

Complexity. Time: O(d) where d is the number of properties. Space: O(d) for the output vector.

Algorithm 25.1: Correctness. Follows from Theorem 25.1. The extraction formula e_j = \lfloor n / \pi_j \rfloor \bmod \beta(P_j) correctly recovers the j-th digit of the mixed-radix representation, and adding a_j produces the original property value.

25.0.0.2 PropertiesToIndex

Description. Given a tuple of property values (v_0, \ldots, v_{d-1}) and a property set \mathcal{P}, this algorithm computes the forward mixed-radix mapping \phi(v_0, \ldots, v_{d-1}). Each value is validated against its property bounds, encoded by subtracting the minimum value, and combined via a dot product with the precomputed radix powers.

PropertiesToIndex
1: Let P = (P₀, ..., P_{d-1}) be properties with bases β(P_j) and min values a_j
2: Let π₀ = 1, π_j = ∏_{i=0}^{j-1} β(Pᵢ) be precomputed radix powers
3: Let (v₀, ..., v_{d-1}) be property values with v_j ∈ V(P_j)

4: function PropertiesToIndex((v₀, ..., v_{d-1}), P, π)
5:   n ← 0
6:   for j ← 0 to d-1 do
7:     if v_j < a_j or v_j > b_j then             ▷ Validate bounds
8:       error "value out of range"
9:     end if
10:    e_j ← v_j - a_j                             ▷ Encode to 0-based digit
11:    n ← n + e_j · π_j                            ▷ Accumulate weighted digit
12:  end for
13:  return n
14: end function

Correspondence table:

Pseudocode variable Math symbol Code variable (file:function)
(v_0, \ldots, v_{d-1}) property value tuple props (dict or array) (state_indexing.py:PropertySet.props_to_index)
a_j, b_j min/max value of P_j prop.min_value, prop.max_value (state_indexing.py:Property)
e_j encoded digit prop.encode_value(props[prop.name]) (state_indexing.py:Property.encode_value)
\pi_j radix power self._radix_powers (state_indexing.py:PropertySet)
n linear index return value of np.dot(encoded, self._radix_powers) (state_indexing.py:PropertySet.props_to_index)

Complexity. Time: O(d) where d is the number of properties. Space: O(d) for the encoded digit array.

Algorithm 25.2: Correctness. Follows from Theorem 25.1. The encoding e_j = v_j - a_j maps \mathcal{V}(P_j) bijectively to \{0, \ldots, \beta(P_j)-1\}, and the weighted sum \sum_j e_j \pi_j produces the unique linear index in \{0, \ldots, N-1\}.

Implementation Notes

Source code mapping:

Algorithm File Function Notes
Algorithm 25.1 src/phasic/state_indexing.py PropertySet.index_to_props L300–L422
Algorithm 25.2 src/phasic/state_indexing.py PropertySet.props_to_index L428–L577
Radix power precomputation src/phasic/state_indexing.py PropertySet.__init__ L288–L289
Property validation src/phasic/state_indexing.py Property.validate_value L113–L131

Deviations from pseudocode:

  • Vectorized array input. The implementation of index_to_props accepts NumPy arrays of indices and processes them in a vectorized loop, extracting all digits for all indices simultaneously using broadcasting. The pseudocode presents the scalar case for clarity.

  • Partial property specification. The props_to_index method supports partial specifications (only some properties given), returning an array of all matching indices. This is implemented via itertools.product over unspecified property ranges. The pseudocode presents only the complete specification case; the partial case is a straightforward enumeration over the Cartesian product of unspecified properties.

  • Multiple output formats. index_to_props supports three output formats: a dynamically-created frozen dataclass (default), a dictionary (as_dict=True), and a raw NumPy array (as_values=True). The pseudocode returns a plain tuple. These formats are presentation-level differences and do not affect the underlying algorithm.

  • StateIndexer composition. The StateIndexer class manages multiple PropertySet instances and Slot objects with offset-based addressing. This is a simple concatenation of index ranges (Definition 25.4) and does not involve algorithmic complexity beyond tracking cumulative offsets.

Symbol Index

Symbol Name First appearance
a_j Minimum value of property P_j Definition 25.1
b_j Maximum value of property P_j Definition 25.1
\beta(P) Base (radix) of a property Definition 25.1
d Number of properties in a PropertySet Definition 25.2
e_j Encoded digit (0-based offset from a_j) Definition 25.3
\mathcal{I} StateIndexer Definition 25.4
N(\mathcal{P}) State length of a PropertySet Definition 25.2
\mathcal{P} PropertySet Definition 25.2
P_j Property (named bounded integer dimension) Definition 25.1
\phi Mixed-radix encoding bijection Definition 25.3
\phi^{-1} Mixed-radix decoding (inverse bijection) Definition 25.3
\pi_j Radix power (cumulative product of bases) Definition 25.3
\mathcal{V}(P) Set of valid values for property P Definition 25.1