Registers, Lanes, and Berry Phase: Lifting Siunertaq from Batch Graphs to the Complex Plane

The previous article is here: Mainframes, Monads, and Stack Machines: Extending Siunertaq to the Batch Layer

This is the second follow-up to Dhall-to-Effect and provably safe task orchestration and Mainframes, Monads, and Stack Machines. The short version of both: we model build graphs as norm-bounded quivers, verify them with Z3/Yices before any effect fires, and orchestrate steps using JCL-inspired ADTs backed by Dhall and Spring Batch + Pekko.

The previous posts worked at the graph level: vertices, edges, the shape of dependencies. This one goes one level down — to registers.

The insight that forced this descent was noticing that Ref[IO, Array[Byte]] is a dangerous type: anyone who calls .get holds a mutable alias to the array and can change its contents without going through the Ref. We needed something with the guarantees of a GPU register file: fixed width, typed, atomically updated, never aliased. That observation turned out to connect directly to SIMD/SIMT architecture — and from there, to complex Berkovich heights and braid group theory.

Here is how it unfolded.

🗃️ The Problem with `Ref[IO, Array[Byte]]`

Consider the naive state representation for a Golay codeword stored in a Cats Effect Ref:

val codewordRef: Ref[IO, Array[Byte]] = ???

// Looks safe…
codewordRef.get.flatMap { arr =>
  arr(0) = 0xFF.toByte  // ← mutation outside the Ref boundary. Ref never sees this change.
  IO.unit
}

This is the array-aliasing problem. Array[Byte] is mutable; Ref only protects against concurrent writes through the Ref, not against mutation of the value you extracted. The fix isn't synchronized — it's to make the inner value structurally immutable.

IArray[Byte] (Scala 3's immutable array wrapper) closes that gap. But there is a second problem: Ref[IO, IArray[Byte]] carries no information about which codeword this is, or how long it should be. The length constraint (exactly 23 bytes for a binary Golay word) has to live somewhere. In a GPU, the register width is a hardware invariant; in Scala, we encode it as an opaque type.

🔒 `opaque type` as a Register Width Guarantee

opaque type GolayWord = IArray[Byte]

object GolayWord:
  def apply(bytes: IArray[Byte]): Either[String, GolayWord] =
    if bytes.length == 23 then Right(bytes)
    else Left(s"Golay(23) requires exactly 23 bytes; got ${bytes.length}")

  val zero: GolayWord = IArray.fill(23)(0.toByte)

  extension (w: GolayWord)
    def xor(other: GolayWord): GolayWord =
      IArray.tabulate(23)(i => (w(i) ^ other(i)).toByte)
    // xor is closed: result is always 23 bytes, so the invariant is preserved

Outside the GolayWord companion, callers hold an abstract token. They cannot:

Construct a GolayWord of the wrong length.
Call IArray methods directly and accidentally truncate it.
Pass it where an arbitrary IArray[Byte] is expected without an explicit .bytes call.

This is identical to what a typed register does in hardware: the width is enforced at the boundary, and the CPU never sees a "34-bit float" just because someone miscounted.

A second opaque layer wraps the Ref itself:

opaque type GolayWordRef = Ref[IO, GolayWord]

object GolayWordRef:
  def make(initial: GolayWord): IO[GolayWordRef] = Ref.of[IO, GolayWord](initial)

  extension (ref: GolayWordRef)
    def xorWith(other: GolayWord): IO[Unit] = ref.update(_.xor(other))
    def hammingWeight: IO[Int]               = ref.get.map(_.count(_ != 0))
    // The raw Ref is never exposed; only algebra-preserving updates are permitted.

The invariant is now structural: because GolayWord.xor is closed (it always returns 23 bytes), every permitted update through GolayWordRef preserves the length. You cannot break the 23-byte constraint without defeating the opaque type — which requires touching the companion object source.

🚀 SIMD Lane Counts: the Orbit Sizes

Now consider the Golay code's five weight classes: W0, W8, W12, W16, W24.

W0  — orbit size    1  (the zero codeword)
W8  — orbit size  759  (octads; Steiner S(5,8,24) blocks)
W12 — orbit size 2576  (dodecads)
W16 — orbit size  759  (complement octads)
W24 — orbit size    1  (the all-ones codeword)
Total: 4096 = 2¹²

Each weight class is not a single value — it is a lane group. When the Frobenius morphism acts on W8, it acts on all 759 octads simultaneously. This is SIMD in the mathematical sense: one instruction (FVRole.Frobenius), 759 data elements, all updated in lockstep.

extension (gw: GolayWeight)
  def orbitSize: Int = gw match
    case GolayWeight.W8  => 759   // ← 759 SIMD lanes
    case GolayWeight.W12 => 2576  // ← 2576 SIMD lanes (the densest class)
    ...

The SIMT side is the Ref[IO, State] cells — one per quiver vertex. Each cell is a thread register running its own fiber under Cats Effect:

class BSDQuiverManager(
  val leechRef:   Ref[IO, LeechState],    // Thread 0 register file
  val affineRef:  Ref[IO, AffineDualState], // Thread 1 register file
  val padicRef:   Ref[IO, PadicState],    // Thread 2 register file
  val selmerRef:  Ref[IO, SelmerState]    // Thread 3 register file
)

IO.parTraverse over the BSDArrow list is the warp: all four fibers execute the same instruction stream (executeArrow) on their own registers, concurrently. The quiver morphism is the instruction; the Ref cells are the per-thread register file.

GPU	Siunertaq
SIMD lane	`GolayWeight.orbitSize` elements per weight class
SIMT thread register	`Ref[IO, State]` per BSDVertex
Warp instruction	`BSDArrow[S, T]` morphism
Warp execution	`IO.parTraverse` over arrow list
Thread mask (predication)	`galoisHeight` semistability bound
Register width guarantee	`opaque type GolayWord` / `opaque type PhantomCarabinerRef`
IEEE 754 machine epsilon	`MachineConstants.machineEpsilonReal = 2⁻⁵²`
Mantissa bits	`MachineConstants.valuationDepth = 52`

machineEpsilonReal and valuationDepth are not decorative. The qAdicEquivalent tolerance check uses machineEpsilonReal as its floor, and the p-adic valuation tower truncates at depth 52 — the point where the floating-point representation runs out. The theory is literally grounded in the hardware.

🧵 The Braid Word Problem

BSDArrow is a phantom-typed enum:

enum BSDArrow[Src <: BSDVertex, Tgt <: BSDVertex](
  val src: Src, val tgt: Tgt, val role: FVRole, val effect: IO[Unit])

A List[BSDArrow[? <: BSDVertex, ? <: BSDVertex]] is a braid word: a sequence of typed generators where each generator knows which strands it connects. Reading the list left-to-right and applying each generator is the JVM analogue of evaluating a braid group element one letter at a time.

The complication: BSDArrow has no unapply, so you cannot destructure it with a constructor pattern:

// ILLEGAL — the compiler rejects this
case BSDArrow(src, tgt, FVRole.Frobenius, _) => ...
// Error: BSDArrow cannot be used as an extractor; lacks unapply

The fix is what we call a braid pop: extract only the field you need, and let the declared type bound discharge the existential automatically.

// Pop .role — it is a concrete FVRole; no existential type involved.
a.role match
  case FVRole.Frobenius    => ThresholdConstraint.FrobeniusGE(a.src, a.tgt)
  case FVRole.Verschiebung => ThresholdConstraint.VerschiebungLE(a.src, a.tgt)
// a.src has type ? <: BSDVertex; since BSDVertex is the upper bound,
// it is accepted wherever BSDVertex is expected — no cast, no ascription.

Alternatively, a type-test pop fully resolves the existential by naming the concrete case:

case a: BSDArrow.TensorBang =>
  ThresholdConstraint.FrobeniusGE(a.src, a.tgt)
// a.src: Leech.type   — a singleton type, provably <: BSDVertex
// a.tgt: AffineDual.type — same; the compiler checks this at compile time

The type-test form is used when exhaustiveness matters (e.g. BSDQuiverManager.executeArrow, which must cover all four cases). The .role match form is used when the semantic partition — Frobenius vs Verschiebung — is what matters (e.g. ThresholdProblem.fromArrows). Either way, no runtime cast is needed; the phantom types do the work.

📐 Lifting to the Complex Plane

The discrete GolayWeight encodes five points on the Berkovich tree. To model continuous motion along the tree — which is what the IUT Θ-link and p-adic Frobenius actually do — we need a complex-valued position.

opaque type ComplexWeight = (Double, Double)
// re: Berkovich height h ∈ [0, 8]   (where does the point sit on the tree?)
// im: obstruction residual degree    (how far off the real axis?)

This is the step from u8 (discrete weight) to c64 (complex float) in GPU register-type terminology. The width is still enforced by the opaque type; the algebra is now richer.

PhantomCarabiner holds one ComplexWeight and a ℤ/4ℤ phase, and exposes three core operations:

complement   w ↦ 6 − conj(w)    self-duality; h + h̄ = 6 on ℂ
verschiebung w ↦ w / 2          Witt-vector V for p = 2; contracts toward 0
thetaLink    w ↦ −w·i           IUT Θ-link; rotates by −π/2; period 4

thetaLink has ℤ₄ periodicity: apply it four times and you return to the start. This is exactly the Pauli group action, and it mirrors how a 90° rotation in a SIMD lane accumulates to a full 360° after four passes.

🍒 Berry Phase: the Observable That Survives Orbit-Averaging

After a long sequence of quiver morphisms, what can you actually read out of a PhantomCarabiner?

def berryPhaseAngle: Double  // arg(weight) ∈ (−π, π]
def weightNormSq:    Double  // |weight|² = re² + im²

These two quantities are gauge-invariant: they do not depend on the path taken through the braid word, only on the accumulated holonomy.

weightNormSq is preserved by thetaLink (rotation, no scaling) and halved by verschiebung².
berryPhaseAngle advances by exactly −π/2 under each thetaLink and is unchanged by verschiebung.

In physics terms, berryPhaseAngle is the holonomy of the adiabatic connection — the phase your state picks up when you move it around a closed loop in parameter space. In GPU terms, it is the value in the accumulator register after the warp finishes: the only thing the CPU can measure once the thread's intermediate states are gone.

PhantomCarabinerRef wraps the mutable state and exposes only the gauge-invariant projections as reads:

opaque type PhantomCarabinerRef = Ref[IO, PhantomCarabiner]

extension (ref: PhantomCarabinerRef)
  def berryPhaseAngle: IO[Double] = ref.get.map(_.berryPhaseAngle)
  def weightNormSq:    IO[Double] = ref.get.map(_.weightNormSq)
  def applyThetaLink:  IO[Unit]   = ref.update(_.thetaLink)
  // The raw Ref is inaccessible; only algebra-preserving transitions are exposed.

📏 Two Heights, Two Semistability Conditions

One of the more subtle points in the codebase is that there are two height functions, used for two different purposes:

octadHeight(w)  = w / 3                     linear   [0, 8]; Berkovich tree position
galoisHeight(n) = 8 · log(n) / log(24)      logarithmic; GIT semistability mask

octadHeight is the per-thread register value: where is this thread on the tree? It satisfies h(w) + h(24-w) = 8 (the fan functional equation), making the Golay route self-dual.

galoisHeight is the thread mask. Orbits whose "instruction count" grows faster than log are semistability-filtered out before the SMT solver even sees them — exactly as divergent SIMT threads are predicated off before the instruction is retired. The two functions are numerically different:

galoisHeight(8)  ≈ 5.23   — representation-theoretic weight
octadHeight(8)   = 2.67   — geometric height on the Berkovich tree

Conflating them (the mock galoisHeight(tau) = tau.toDouble in earlier versions of BSDQuiver.scala) is the equivalent of using the lane index as the accumulator value — a type error that the compiler cannot catch but the math will eventually surface.

🔭 What's Next: Wiring the Register File into the Batch Layer

With PhantomCarabinerRef in place, the remaining step is to wire it into BSDQuiverManager so that berryPhaseAngle is tracked across arrow firings:

// Planned: each vertex Ref also holds a PhantomCarabiner
class BSDQuiverManager(
  val leechRef:   Ref[IO, (LeechState, PhantomCarabiner)],
  ...
):
  def executeArrow[S, T](arrow: BSDArrow[S, T]): IO[Boolean] = ???
  // After SMT confirms the norm bound is satisfiable:
  //   fire arrow.effect
  //   update the phantom register for tgt via thetaLink or verschiebung
  //   expose berryPhaseAngle for the monitoring layer

Once that is in place, the batch layer gains a continuous diagnostic signal: the Berry phase accumulated across the job graph tells you how much the evaluation has rotated away from the real axis — in other words, how much p-adic obstruction the completed steps have introduced into the downstream norm bounds.

The mlir-bridge will then map BSDArrow decompositions to MLIR Affine Dialect norm constraints, JIT-compiling the verified bounds via LLVM IR. At that point, the stack machine in core is no longer just an evaluator — it is an IR generator, and the Dhall REPL becomes a live assembler for a JIT-compiled Berkovich-tree traversal.

The codebase is at github.com/Yoshyhyrro/Siunertaq. The Lean 4 specs for PhantomCarabiner, YangBaxterBanach, Carabiner, and MachineConstants are in the repository. PRs improving proof coverage — especially closing the open sorry on complement_theta_link_comm and yang_baxter_eq — are very welcome. 🙏

推荐订阅源

DEV Community

🗃️ The Problem with `Ref[IO, Array[Byte]]`

🔒 `opaque type` as a Register Width Guarantee

🚀 SIMD Lane Counts: the Orbit Sizes

🧵 The Braid Word Problem

📐 Lifting to the Complex Plane

🍒 Berry Phase: the Observable That Survives Orbit-Averaging

📏 Two Heights, Two Semistability Conditions

🔭 What's Next: Wiring the Register File into the Batch Layer

推荐订阅源

DEV Community

🗃️ The Problem with Ref[IO, Array[Byte]]

🔒 opaque type as a Register Width Guarantee

🚀 SIMD Lane Counts: the Orbit Sizes

🧵 The Braid Word Problem

📐 Lifting to the Complex Plane

🍒 Berry Phase: the Observable That Survives Orbit-Averaging

📏 Two Heights, Two Semistability Conditions

🔭 What's Next: Wiring the Register File into the Batch Layer

🗃️ The Problem with `Ref[IO, Array[Byte]]`

🔒 `opaque type` as a Register Width Guarantee