From one 78-line protocol spec, five independent verifier backends close the same closed-form sandwich bound on BBRv3's starvation-onset time. A patched filter certifies the complementary positive theorem. One command reproduces every verdict in under 20 minutes.
A network-systems engineer evaluating a congestion-control algorithm (CCA) draft, whether a production kernel patch, a cloud-gaming provider's bespoke sender, or an IETF working-group proposal, has a problem the testbench cannot solve. A simulator sweep that finds no starved flows does not prove starvation cannot happen. A handful of hand-drawn adversarial traces does not cover the parameter regime the sender will actually meet in production. The gap between "we ran it" and "it is correct" is the gap that every CCA deployment carries into the network.
Formal CCA work to date fills that gap one tool at a time. CCmatic[1] encodes NewReno/Copa-style senders in Z3. Arun et al.'s[2] CCAC encodes abstract CCA models in SMT. Ratemon[3] instruments kernel TCP for runtime contracts. Each uses a different formal substrate, proves a different property family, and does not compose: a bug caught in one encoding says nothing about the next. Adding a second verifier to a given CCA is, in practice, rewriting the model from scratch.
This report summarises Kairos's headline CCA result: a machine-checked refutation of BBRv3[4] starvation. Kairos takes a single declarative protocol spec and compiles it into five parallel verifier artifacts. A 78-line BBRv3 spec declares the four-mode state machine, five pure sub-step transitions, and a closed-form onset theorem. From that spec Kairos emits a Lean 4 proof obligation, a Dafny/Z3 program, an EBMC k-induction target in SystemVerilog, a Hypothesis property-based sweep, and a packet-level CPU simulator. Kairos reconciles the five verdicts into a sandwich bound: a theorem is accepted only when two or more independent backends close it and none refutes it. Disagreement is a proof-obligation flag, not a publication blocker.
The closed-form onset bound is T(B, D) = (⌊B/D⌋ − 2)·W + c, where B is the ACK-aggregation burst magnitude, D the equilibrium path delay, W the min-RTT filter window, and c ≤ W a pacing-cycle constant. All five backends close the theorem at the tuple (B, D, W) satisfying B > 2D; zero backends refute. A 1,200-cell residual sweep confirms the bound empirically with median residual 4 RTT and p95 ≤ 7 RTT over 99% of cells. A four-iteration meta-evidence chain, driven by verifier disagreement rather than author intuition, refined the theorem statement through four counterexample catches before it stabilised. A complementary patched filter F* is certified by the same pipeline against the positive theorem; the patch closes the starvation pathway without a change to the pacing-gain control loop.
| Artifact | Value | |||||
|---|---|---|---|---|---|---|
| Kairos spec size | 78 lines | |||||
| Independent verifier backends | 5 (Lean 4 / Dafny+Z3 / EBMC / Hypothesis / CPU sim.) | |||||
| Lean 4 lemmas closed (BBRv3 neg. + trace + patched filter) | 9 + 5 + 3 of 17 hand-authored | |||||
| Lean 4 core axioms used | 3 (propext, Classical.choice, Quot.sound) | |||||
EBMC k-induction bound | k = 100, 4/4 SVA invariants PROVED | |||||
| Dafny / Z3 assertions (BBRv3 + patched filter + AIMD) | 25 + 9 + 10 verified | |||||
| Hypothesis examples per theorem | 100,000 · zero falsifying | |||||
| CPU-sim residual sweep cells | 1,200 (BBRv3) + 600 (CUBIC) + 600 (Reno) | |||||
Residual median (p95), RTT | 4 (7) | |||||
Linux kernel tcp_bbr replay cells / onsets | 25 / 0 (abstraction sound on every cell) | |||||
| Meta-evidence iterations to theorem stabilisation | 4 (each driven by a backend-caught counterexample) | |||||
| Replication wall-time (16 CPUs) | < 20 minutes | |||||
| Backend | BBRv3 (neg.) | Patched F* (pos.) |
CUBIC (pos.) | Reno (pos.) |
|---|---|---|---|---|
| Lean 4 | closed (9 arith + 5 trace) | closed (4 lemmas) | closed (3 lemmas) | closed (3 lemmas) |
| Dafny / Z3 | 25 / 25 verified | 9 / 9 verified | 5 / 5 verified | 5 / 5 verified |
EBMC k-induction |
PROVED at k = 100 |
PROVED at k = 100 |
PROVED (4/4 obl.) | PROVED (4/4 obl.) |
| Hypothesis | 100,000 ex., onset ≤ bound | 100,000 ex., no starvation | 100,000 ex., no starvation | 100,000 ex., no starvation |
| CPU simulator | 40 / 40 starve (as expected) | 0 / 40 starve | 0 / 600 starve | 0 / 600 starve |
Nothing on this page depends on model-to-model comparison or an LLM-as-judge. The sole oracles are (a) the Lean 4 typechecker, (b) the Z3 SMT solver via Dafny, (c) EBMC's k-induction decision, (d) Hypothesis's deterministic shrinker over pinned seeds, and (e) a packet-level CPU simulator. Agreement across all five is the evidentiary stance; any single backend's verdict in isolation is not a proof. A spec is accepted only when at least two backends close it and none refutes.
starves_withinLet t be a BBRv3 trace with parameters (W, D, B) and initial mode ProbeBW. If the schedule satisfies the reachable-window zero-delivery property hSched (every reachable W-window of the schedule contains a zero-delivered tick) and B > 2D, then there exists n with n ≤ (⌊B/D⌋ − 2)·W + c such that the pacing rate at tick n is zero.
The theorem is proved by nine closed Lean 4 statements that compose as follows.
min_rtt_filt is monotone non-increasing over any window of length ≤ W when pacing_rate ≥ 0. The non-negativity hypothesis is load-bearing: iteration 1 of the meta-evidence chain (below) refuted the variant without it.B inflate the bandwidth estimate by a factor of at least B/D within one window.B > 2D the cwnd_gain = 2 cap admits the inflated rate, so pacing exceeds the link's delivery rate for a strictly positive fraction of the ProbeBW schedule.hSched each reachable W-window contains at least one zero-delivered tick; the windowed-max therefore drains to zero within W ticks after the last positive sample.(⌊B/D⌋ − 2)·W + c.W-wide filter, giving the dual n + W < (⌊B/D⌋ − 2)·W + c.B = 2D the closed form collapses to the Arun et al. (2022)[2] lower-bound statement; the leading term vanishes.c-bound. c ≤ W, recovered from the ProbeBW pacing-gain cycle, the cwnd_gain cap, and the ProbeRTT duration.All nine lemmas of OnsetTheorem.lean (including the main starves_within) and consecutive_zeros_drain from OnsetTheoremTrace.lean are closed Lean proofs at the submission commit. Axioms used: propext, Classical.choice, Quot.sound, the Lean 4 core only.
No single backend is sufficient. Each verifies a different refinement of the one-step transition relation. Lean 4 closes the symbolic theorem. EBMC k-induction at k = 100 discharges four SystemVerilog invariants (pacing-matches-bw-max, drain-implies-zero-pacing, onset upper bound, any-nonzero-implies-nonzero) in sub-second wall time. Dafny/Z3 checks the same transition function as an SMT cross-replay, with Z3 terminating in under five seconds. Hypothesis shrinks 100,000 examples per theorem with zero falsifying at the submission commit. The CPU simulator sweeps a 1,200-cell (B, D, link_rate, seed) grid with median residual 4 RTT.
| Operator | Lean | EBMC | Hyp. | CPU | Quorum (of 4) |
|---|---|---|---|---|---|
M01: W off-by-one | ✓ | ✓ | ✓ | ✓ | 4 |
M02: drop −2 in B/D | ✓ | ✓ | ✓ | ✓ | 4 |
M03: swap B, D | ✓ | ✓ | ✓ | ✓ | 4 |
M04: remove c | ✓ | ✓ | × | ✓ | 3 |
| M05: weaken min-RTT monotonicity | ✓ | ✓ | × | × | 2 |
| M06: flip inequality in Lemma 2 | ✓ | ✓ | ✓ | × | 3 |
M07: drop B > 2D hypothesis | ✓ | × | ✓ | ✓ | 3 |
| M08: break Arun boundary | ✓ | ✓ | × | × | 2 |
| M09: 3-flow quorum to any | ✓ | ✓ | ✓ | ✓ | 4 |
| M10: pacing-rate off-by-one | × | ✓ | ✓ | ✓ | 3 |
| Coverage (of 10) | 9 | 9 | 7 | 7 | 32 / 40 |
The theorem statement itself was refined over four iterations by verifier disagreement, not author intuition:
pacing_rate. Adding the hypothesis closes Lean and Hypothesis on the same statement.k-induction failed on the onset bound at k = 30. Inspecting the induction skeleton revealed that the SystemVerilog refinement conflated the filter window and the ProbeRTT duration; the spec was corrected and EBMC closes at k = 100.p95 budget on cells with small D. The schedule class was implicitly assuming clocked bursts; adding the aggregating-vs-quiescent schedule taxonomy restored agreement.c-bound that used an arithmetic lemma Lean declined to reuse from Mathlib. The Lean proof was tightened to match Dafny's dependency; both close at submission.Each iteration turned a backend's refutation into a spec amendment. The published theorem is the fixed point of that loop.
F* that does not starveThe negative theorem establishes that BBRv3 starves under adversarial ACK aggregation. The complementary positive theorem replaces the windowed-max filter with a patched form F* that explicitly retains at least one positive delivery sample within each W-window, modulo a small symbolic constant. Against F* Kairos closes a positive theorem: pacing rate stays strictly positive on every reachable tick of the ProbeBW schedule, for every (B, D) satisfying the same hypotheses as Theorem 1.
The patch closes three Lean lemmas (PatchedFilter.lean): F*-positivity, F*-decomposition, and F*-monotonicity-under-schedule. Dafny verifies the corresponding 9 assertions; EBMC k-induction closes at k = 100; Hypothesis draws 100,000 parameter tuples without a falsifying counterexample; and the CPU simulator reports zero starvation cells across 40 quiescent and 40 aggregating cases at the same (B, D) grid. Every backend agrees: the patched filter closes the pathway the windowed-max filter left open, without a change to the cwnd_gain or pacing-gain control loops that operate on its output.
F* is not a production patch proposal. It is a constructive witness that the negative theorem is specific to the filter, not structural to BBRv3's control architecture: the same four-mode state machine that starves under the windowed-max filter does not starve under F*. That constrains the space of remediation work to a clearly scoped component.
(W, D, B) satisfying B > 2D and hSched, not just the values in the sweep. The residual sweep confirms the bound's empirical tightness, not its truth. A deployment-timing decision on the draft can reference a machine-checked starvation envelope instead of a testbench screenshot.For CCA designers: one spec, five verdicts, a single reconciled bound. The cost of adding a second verifier to a given CCA is zero additional spec work. Every verdict in this report, together with the full proof artifacts, mutation logs, residual histograms, and kernel-replay cells, is reproducible inside Kairos on 16 CPUs in under 20 minutes.