Technical Report · April 2026

Congestion Control:
A Machine-Checked Refutation of BBRv3 Starvation

From one 78-line protocol spec, five independent verifier backends close the same closed-form sandwich bound on BBRv3's starvation-onset time. A patched filter certifies the complementary positive theorem. One command reproduces every verdict in under 20 minutes.

Abstract

A network-systems engineer evaluating a congestion-control algorithm (CCA) draft, whether a production kernel patch, a cloud-gaming provider's bespoke sender, or an IETF working-group proposal, has a problem the testbench cannot solve. A simulator sweep that finds no starved flows does not prove starvation cannot happen. A handful of hand-drawn adversarial traces does not cover the parameter regime the sender will actually meet in production. The gap between "we ran it" and "it is correct" is the gap that every CCA deployment carries into the network.

Formal CCA work to date fills that gap one tool at a time. CCmatic^[1] encodes NewReno/Copa-style senders in Z3. Arun et al.'s^[2] CCAC encodes abstract CCA models in SMT. Ratemon^[3] instruments kernel TCP for runtime contracts. Each uses a different formal substrate, proves a different property family, and does not compose: a bug caught in one encoding says nothing about the next. Adding a second verifier to a given CCA is, in practice, rewriting the model from scratch.

This report summarises Kairos's headline CCA result: a machine-checked refutation of BBRv3^[4] starvation. Kairos takes a single declarative protocol spec and compiles it into five parallel verifier artifacts. A 78-line BBRv3 spec declares the four-mode state machine, five pure sub-step transitions, and a closed-form onset theorem. From that spec Kairos emits a Lean 4 proof obligation, a Dafny/Z3 program, an EBMC k-induction target in SystemVerilog, a Hypothesis property-based sweep, and a packet-level CPU simulator. Kairos reconciles the five verdicts into a sandwich bound: a theorem is accepted only when two or more independent backends close it and none refutes it. Disagreement is a proof-obligation flag, not a publication blocker.

The closed-form onset bound is T(B, D) = (⌊B/D⌋ − 2)·W + c, where B is the ACK-aggregation burst magnitude, D the equilibrium path delay, W the min-RTT filter window, and c ≤ W a pacing-cycle constant. All five backends close the theorem at the tuple (B, D, W) satisfying B > 2D; zero backends refute. A 1,200-cell residual sweep confirms the bound empirically with median residual 4 RTT and p₉₅ ≤ 7 RTT over 99% of cells. A four-iteration meta-evidence chain, driven by verifier disagreement rather than author intuition, refined the theorem statement through four counterexample catches before it stabilised. A complementary patched filter F* is certified by the same pipeline against the positive theorem; the patch closes the starvation pathway without a change to the pacing-gain control loop.

Summary statistics

Artifact		Value
Kairos spec size		78 lines
Independent verifier backends		5 (Lean 4 / Dafny+Z3 / EBMC / Hypothesis / CPU sim.)
Lean 4 lemmas closed (BBRv3 neg. + trace + patched filter)		9 + 5 + 3 of 17 hand-authored
Lean 4 core axioms used		3 (`propext`, `Classical.choice`, `Quot.sound`)
EBMC `k`-induction bound		`k = 100`, 4/4 SVA invariants PROVED
Dafny / Z3 assertions (BBRv3 + patched filter + AIMD)		25 + 9 + 10 verified
Hypothesis examples per theorem		100,000 · zero falsifying
CPU-sim residual sweep cells		1,200 (BBRv3) + 600 (CUBIC) + 600 (Reno)
Residual median (`p₉₅`), RTT		4 (7)
Linux kernel `tcp_bbr` replay cells / onsets		25 / 0 (abstraction sound on every cell)
Meta-evidence iterations to theorem stabilisation		4 (each driven by a backend-caught counterexample)
Replication wall-time (16 CPUs)		< 20 minutes

Per-backend verdicts across four CCAs

Backend	BBRv3 (neg.)	Patched `F*` (pos.)	CUBIC (pos.)	Reno (pos.)
Lean 4	closed (9 arith + 5 trace)	closed (4 lemmas)	closed (3 lemmas)	closed (3 lemmas)
Dafny / Z3	25 / 25 verified	9 / 9 verified	5 / 5 verified	5 / 5 verified
EBMC `k`-induction	PROVED at `k = 100`	PROVED at `k = 100`	PROVED (4/4 obl.)	PROVED (4/4 obl.)
Hypothesis	100,000 ex., onset ≤ bound	100,000 ex., no starvation	100,000 ex., no starvation	100,000 ex., no starvation
CPU simulator	40 / 40 starve (as expected)	0 / 40 starve	0 / 600 starve	0 / 600 starve

Every column closes 5/5 backends with zero refutations. The CPU-simulator "40 / 40 starve" on BBRv3 is the expected outcome for the negative theorem: under the quiescent schedule with B > 2D, every cell should reach zero pacing within the bound.

Scoring

Nothing on this page depends on model-to-model comparison or an LLM-as-judge. The sole oracles are (a) the Lean 4 typechecker, (b) the Z3 SMT solver via Dafny, (c) EBMC's k-induction decision, (d) Hypothesis's deterministic shrinker over pinned seeds, and (e) a packet-level CPU simulator. Agreement across all five is the evidentiary stance; any single backend's verdict in isolation is not a proof. A spec is accepted only when at least two backends close it and none refutes.

Result 1. The starvation-onset sandwich bound

Theorem: `starves_within`

Let t be a BBRv3 trace with parameters (W, D, B) and initial mode ProbeBW. If the schedule satisfies the reachable-window zero-delivery property hSched (every reachable W-window of the schedule contains a zero-delivered tick) and B > 2D, then there exists n with n ≤ (⌊B/D⌋ − 2)·W + c such that the pacing rate at tick n is zero.

Why the bound holds

The theorem is proved by nine closed Lean 4 statements that compose as follows.

min-RTT monotonicity. min_rtt_filt is monotone non-increasing over any window of length ≤ W when pacing_rate ≥ 0. The non-negativity hypothesis is load-bearing: iteration 1 of the meta-evidence chain (below) refuted the variant without it.
ACK-aggregation inflation. Clocked bursts of size B inflate the bandwidth estimate by a factor of at least B/D within one window.
cwnd_gain insufficiency. At B > 2D the cwnd_gain = 2 cap admits the inflated rate, so pacing exceeds the link's delivery rate for a strictly positive fraction of the ProbeBW schedule.
bw-filter drain. Under hSched each reachable W-window contains at least one zero-delivered tick; the windowed-max therefore drains to zero within W ticks after the last positive sample.
Onset upper bound. Combining inflation and drain, the pacing rate reaches zero no later than (⌊B/D⌋ − 2)·W + c.
Onset lower bound. The pacing rate stays positive while fresh samples remain in the W-wide filter, giving the dual n + W < (⌊B/D⌋ − 2)·W + c.
Arun specialisation. At the boundary B = 2D the closed form collapses to the Arun et al. (2022)^[2] lower-bound statement; the leading term vanishes.
c-bound. c ≤ W, recovered from the ProbeBW pacing-gain cycle, the cwnd_gain cap, and the ProbeRTT duration.

All nine lemmas of OnsetTheorem.lean (including the main starves_within) and consecutive_zeros_drain from OnsetTheoremTrace.lean are closed Lean proofs at the submission commit. Axioms used: propext, Classical.choice, Quot.sound, the Lean 4 core only.

Result 2. Five-verifier cross-check and mutation soundness

Cross-backend agreement is the evidentiary stance

No single backend is sufficient. Each verifies a different refinement of the one-step transition relation. Lean 4 closes the symbolic theorem. EBMC k-induction at k = 100 discharges four SystemVerilog invariants (pacing-matches-bw-max, drain-implies-zero-pacing, onset upper bound, any-nonzero-implies-nonzero) in sub-second wall time. Dafny/Z3 checks the same transition function as an SMT cross-replay, with Z3 terminating in under five seconds. Hypothesis shrinks 100,000 examples per theorem with zero falsifying at the submission commit. The CPU simulator sweeps a 1,200-cell (B, D, link_rate, seed) grid with median residual 4 RTT.

Mutation-kill matrix: backends disagree on bugs, their union disagrees on none

Operator	Lean	EBMC	Hyp.	CPU	Quorum (of 4)
M01: `W` off-by-one	✓	✓	✓	✓	4
M02: drop `−2` in `B/D`	✓	✓	✓	✓	4
M03: swap `B`, `D`	✓	✓	✓	✓	4
M04: remove `c`	✓	✓	×	✓	3
M05: weaken min-RTT monotonicity	✓	✓	×	×	2
M06: flip inequality in Lemma 2	✓	✓	✓	×	3
M07: drop `B > 2D` hypothesis	✓	×	✓	✓	3
M08: break Arun boundary	✓	✓	×	×	2
M09: 3-flow quorum to any	✓	✓	✓	✓	4
M10: pacing-rate off-by-one	×	✓	✓	✓	3
Coverage (of 10)	9	9	7	7	32 / 40

No single backend catches every mutation. The union of Lean, EBMC, Hypothesis, and CPU simulator catches every one, and the Dafny verdict closes the fifth column. This is the soundness argument for the sandwich rule: disagreement is evidence that the spec or a backend is wrong, and either is worth debugging.

Meta-evidence: the iteratively-refined theorem

The theorem statement itself was refined over four iterations by verifier disagreement, not author intuition:

Iteration 1. Hypothesis refuted the variant of min-RTT monotonicity without a non-negativity hypothesis on pacing_rate. Adding the hypothesis closes Lean and Hypothesis on the same statement.
Iteration 2. EBMC k-induction failed on the onset bound at k = 30. Inspecting the induction skeleton revealed that the SystemVerilog refinement conflated the filter window and the ProbeRTT duration; the spec was corrected and EBMC closes at k = 100.
Iteration 3. CPU simulator's residual sweep exceeded the p₉₅ budget on cells with small D. The schedule class was implicitly assuming clocked bursts; adding the aggregating-vs-quiescent schedule taxonomy restored agreement.
Iteration 4. Dafny flagged an SMT-level proof of the c-bound that used an arithmetic lemma Lean declined to reuse from Mathlib. The Lean proof was tightened to match Dafny's dependency; both close at submission.

Each iteration turned a backend's refutation into a spec amendment. The published theorem is the fixed point of that loop.

Result 3. A patched filter `F*` that does not starve

The negative theorem establishes that BBRv3 starves under adversarial ACK aggregation. The complementary positive theorem replaces the windowed-max filter with a patched form F* that explicitly retains at least one positive delivery sample within each W-window, modulo a small symbolic constant. Against F* Kairos closes a positive theorem: pacing rate stays strictly positive on every reachable tick of the ProbeBW schedule, for every (B, D) satisfying the same hypotheses as Theorem 1.

The patch closes three Lean lemmas (PatchedFilter.lean): F*-positivity, F*-decomposition, and F*-monotonicity-under-schedule. Dafny verifies the corresponding 9 assertions; EBMC k-induction closes at k = 100; Hypothesis draws 100,000 parameter tuples without a falsifying counterexample; and the CPU simulator reports zero starvation cells across 40 quiescent and 40 aggregating cases at the same (B, D) grid. Every backend agrees: the patched filter closes the pathway the windowed-max filter left open, without a change to the cwnd_gain or pacing-gain control loops that operate on its output.

F* is not a production patch proposal. It is a constructive witness that the negative theorem is specific to the filter, not structural to BBRv3's control architecture: the same four-mode state machine that starves under the windowed-max filter does not starve under F*. That constrains the space of remediation work to a clearly scoped component.

Significance

A 78-line spec drives five independent verifier backends. Kairos's value is not that Lean or Z3 are new. It is that the same declarative AST compiles to all five without a backend-specific re-encoding. Spec drift, the dominant failure mode in multi-tool verification work, is made impossible by construction.
Sandwich-bound acceptance replaces trust in a single verifier. Any one backend can be unsound; a spec variant can happen to close in one and hide a bug. The published theorem is the one that survives agreement from at least two backends and refutation by none, over every reachable tuple the backends exercise.
Disagreement is the primary debugging signal. Four of the iterations that shaped the final theorem came from a backend catching a counterexample an author would not have written a test for. This inverts the usual verification workflow: disagreement is not a failure of the pipeline, it is the pipeline's output.
The BBRv3 result is usable for the IETF draft. The theorem is symbolic, not statistical: it holds for every (W, D, B) satisfying B > 2D and hSched, not just the values in the sweep. The residual sweep confirms the bound's empirical tightness, not its truth. A deployment-timing decision on the draft can reference a machine-checked starvation envelope instead of a testbench screenshot.

For CCA designers: one spec, five verdicts, a single reconciled bound. The cost of adding a second verifier to a given CCA is zero additional spec work. Every verdict in this report, together with the full proof artifacts, mutation logs, residual histograms, and kernel-replay cells, is reproducible inside Kairos on 16 CPUs in under 20 minutes.

References

Agarwal, A., Alizadeh, M., Balakrishnan, H., "CCmatic: Verified Formal Model Checking for Congestion Control," HotNets, 2024.
Arun, V., Alizadeh, M., Balakrishnan, H., "Starvation in End-to-End Congestion Control," SIGCOMM, 2022. doi:10.1145/3544216.3544223
Choi, J. C., et al., "Ratemon: Runtime Verification of Congestion Control Contracts in the Linux Kernel," 2024.
Cardwell, N., Swett, I., Beshay, M., et al., "BBRv3: Algorithm Bug Fixes and Public Internet Deployment," IETF CCWG, 2024.

Congestion Control:A Machine-Checked Refutation of BBRv3 Starvation