reports
Technical Report · April 2026

Congestion Control:
A Machine-Checked Refutation of BBRv3 Starvation

From one 78-line protocol spec, five independent verifier backends close the same closed-form sandwich bound on BBRv3's starvation-onset time. A patched filter certifies the complementary positive theorem. One command reproduces every verdict in under 20 minutes.

Abstract

A network-systems engineer evaluating a congestion-control algorithm (CCA) draft, whether a production kernel patch, a cloud-gaming provider's bespoke sender, or an IETF working-group proposal, has a problem the testbench cannot solve. A simulator sweep that finds no starved flows does not prove starvation cannot happen. A handful of hand-drawn adversarial traces does not cover the parameter regime the sender will actually meet in production. The gap between "we ran it" and "it is correct" is the gap that every CCA deployment carries into the network.

Formal CCA work to date fills that gap one tool at a time. CCmatic[1] encodes NewReno/Copa-style senders in Z3. Arun et al.'s[2] CCAC encodes abstract CCA models in SMT. Ratemon[3] instruments kernel TCP for runtime contracts. Each uses a different formal substrate, proves a different property family, and does not compose: a bug caught in one encoding says nothing about the next. Adding a second verifier to a given CCA is, in practice, rewriting the model from scratch.

This report summarises Kairos's headline CCA result: a machine-checked refutation of BBRv3[4] starvation. Kairos takes a single declarative protocol spec and compiles it into five parallel verifier artifacts. A 78-line BBRv3 spec declares the four-mode state machine, five pure sub-step transitions, and a closed-form onset theorem. From that spec Kairos emits a Lean 4 proof obligation, a Dafny/Z3 program, an EBMC k-induction target in SystemVerilog, a Hypothesis property-based sweep, and a packet-level CPU simulator. Kairos reconciles the five verdicts into a sandwich bound: a theorem is accepted only when two or more independent backends close it and none refutes it. Disagreement is a proof-obligation flag, not a publication blocker.

The closed-form onset bound is T(B, D) = (⌊B/D⌋ − 2)·W + c, where B is the ACK-aggregation burst magnitude, D the equilibrium path delay, W the min-RTT filter window, and c ≤ W a pacing-cycle constant. All five backends close the theorem at the tuple (B, D, W) satisfying B > 2D; zero backends refute. A 1,200-cell residual sweep confirms the bound empirically with median residual 4 RTT and p95 ≤ 7 RTT over 99% of cells. A four-iteration meta-evidence chain, driven by verifier disagreement rather than author intuition, refined the theorem statement through four counterexample catches before it stabilised. A complementary patched filter F* is certified by the same pipeline against the positive theorem; the patch closes the starvation pathway without a change to the pacing-gain control loop.

Summary statistics

ArtifactValue
Kairos spec size78 lines
Independent verifier backends5 (Lean 4 / Dafny+Z3 / EBMC / Hypothesis / CPU sim.)
Lean 4 lemmas closed (BBRv3 neg. + trace + patched filter)9 + 5 + 3 of 17 hand-authored
Lean 4 core axioms used3 (propext, Classical.choice, Quot.sound)
EBMC k-induction boundk = 100, 4/4 SVA invariants PROVED
Dafny / Z3 assertions (BBRv3 + patched filter + AIMD)25 + 9 + 10 verified
Hypothesis examples per theorem100,000 · zero falsifying
CPU-sim residual sweep cells1,200 (BBRv3) + 600 (CUBIC) + 600 (Reno)
Residual median (p95), RTT4 (7)
Linux kernel tcp_bbr replay cells / onsets25 / 0 (abstraction sound on every cell)
Meta-evidence iterations to theorem stabilisation4 (each driven by a backend-caught counterexample)
Replication wall-time (16 CPUs)< 20 minutes

Per-backend verdicts across four CCAs

Backend BBRv3 (neg.) Patched F* (pos.) CUBIC (pos.) Reno (pos.)
Lean 4 closed (9 arith + 5 trace) closed (4 lemmas) closed (3 lemmas) closed (3 lemmas)
Dafny / Z3 25 / 25 verified 9 / 9 verified 5 / 5 verified 5 / 5 verified
EBMC k-induction PROVED at k = 100 PROVED at k = 100 PROVED (4/4 obl.) PROVED (4/4 obl.)
Hypothesis 100,000 ex., onset ≤ bound 100,000 ex., no starvation 100,000 ex., no starvation 100,000 ex., no starvation
CPU simulator 40 / 40 starve (as expected) 0 / 40 starve 0 / 600 starve 0 / 600 starve

Every column closes 5/5 backends with zero refutations. The CPU-simulator "40 / 40 starve" on BBRv3 is the expected outcome for the negative theorem: under the quiescent schedule with B > 2D, every cell should reach zero pacing within the bound.

Scoring

Nothing on this page depends on model-to-model comparison or an LLM-as-judge. The sole oracles are (a) the Lean 4 typechecker, (b) the Z3 SMT solver via Dafny, (c) EBMC's k-induction decision, (d) Hypothesis's deterministic shrinker over pinned seeds, and (e) a packet-level CPU simulator. Agreement across all five is the evidentiary stance; any single backend's verdict in isolation is not a proof. A spec is accepted only when at least two backends close it and none refutes.

Result 1. The starvation-onset sandwich bound

Theorem: starves_within

Let t be a BBRv3 trace with parameters (W, D, B) and initial mode ProbeBW. If the schedule satisfies the reachable-window zero-delivery property hSched (every reachable W-window of the schedule contains a zero-delivered tick) and B > 2D, then there exists n with n ≤ (⌊B/D⌋ − 2)·W + c such that the pacing rate at tick n is zero.

Why the bound holds

The theorem is proved by nine closed Lean 4 statements that compose as follows.

All nine lemmas of OnsetTheorem.lean (including the main starves_within) and consecutive_zeros_drain from OnsetTheoremTrace.lean are closed Lean proofs at the submission commit. Axioms used: propext, Classical.choice, Quot.sound, the Lean 4 core only.

Result 2. Five-verifier cross-check and mutation soundness

Cross-backend agreement is the evidentiary stance

No single backend is sufficient. Each verifies a different refinement of the one-step transition relation. Lean 4 closes the symbolic theorem. EBMC k-induction at k = 100 discharges four SystemVerilog invariants (pacing-matches-bw-max, drain-implies-zero-pacing, onset upper bound, any-nonzero-implies-nonzero) in sub-second wall time. Dafny/Z3 checks the same transition function as an SMT cross-replay, with Z3 terminating in under five seconds. Hypothesis shrinks 100,000 examples per theorem with zero falsifying at the submission commit. The CPU simulator sweeps a 1,200-cell (B, D, link_rate, seed) grid with median residual 4 RTT.

Mutation-kill matrix: backends disagree on bugs, their union disagrees on none

OperatorLeanEBMCHyp.CPUQuorum (of 4)
M01: W off-by-one4
M02: drop −2 in B/D4
M03: swap B, D4
M04: remove c×3
M05: weaken min-RTT monotonicity××2
M06: flip inequality in Lemma 2×3
M07: drop B > 2D hypothesis×3
M08: break Arun boundary××2
M09: 3-flow quorum to any4
M10: pacing-rate off-by-one×3
Coverage (of 10)997732 / 40

No single backend catches every mutation. The union of Lean, EBMC, Hypothesis, and CPU simulator catches every one, and the Dafny verdict closes the fifth column. This is the soundness argument for the sandwich rule: disagreement is evidence that the spec or a backend is wrong, and either is worth debugging.

Meta-evidence: the iteratively-refined theorem

The theorem statement itself was refined over four iterations by verifier disagreement, not author intuition:

  1. Iteration 1. Hypothesis refuted the variant of min-RTT monotonicity without a non-negativity hypothesis on pacing_rate. Adding the hypothesis closes Lean and Hypothesis on the same statement.
  2. Iteration 2. EBMC k-induction failed on the onset bound at k = 30. Inspecting the induction skeleton revealed that the SystemVerilog refinement conflated the filter window and the ProbeRTT duration; the spec was corrected and EBMC closes at k = 100.
  3. Iteration 3. CPU simulator's residual sweep exceeded the p95 budget on cells with small D. The schedule class was implicitly assuming clocked bursts; adding the aggregating-vs-quiescent schedule taxonomy restored agreement.
  4. Iteration 4. Dafny flagged an SMT-level proof of the c-bound that used an arithmetic lemma Lean declined to reuse from Mathlib. The Lean proof was tightened to match Dafny's dependency; both close at submission.

Each iteration turned a backend's refutation into a spec amendment. The published theorem is the fixed point of that loop.

Result 3. A patched filter F* that does not starve

The negative theorem establishes that BBRv3 starves under adversarial ACK aggregation. The complementary positive theorem replaces the windowed-max filter with a patched form F* that explicitly retains at least one positive delivery sample within each W-window, modulo a small symbolic constant. Against F* Kairos closes a positive theorem: pacing rate stays strictly positive on every reachable tick of the ProbeBW schedule, for every (B, D) satisfying the same hypotheses as Theorem 1.

The patch closes three Lean lemmas (PatchedFilter.lean): F*-positivity, F*-decomposition, and F*-monotonicity-under-schedule. Dafny verifies the corresponding 9 assertions; EBMC k-induction closes at k = 100; Hypothesis draws 100,000 parameter tuples without a falsifying counterexample; and the CPU simulator reports zero starvation cells across 40 quiescent and 40 aggregating cases at the same (B, D) grid. Every backend agrees: the patched filter closes the pathway the windowed-max filter left open, without a change to the cwnd_gain or pacing-gain control loops that operate on its output.

F* is not a production patch proposal. It is a constructive witness that the negative theorem is specific to the filter, not structural to BBRv3's control architecture: the same four-mode state machine that starves under the windowed-max filter does not starve under F*. That constrains the space of remediation work to a clearly scoped component.

Significance

For CCA designers: one spec, five verdicts, a single reconciled bound. The cost of adding a second verifier to a given CCA is zero additional spec work. Every verdict in this report, together with the full proof artifacts, mutation logs, residual histograms, and kernel-replay cells, is reproducible inside Kairos on 16 CPUs in under 20 minutes.

References

  1. Agarwal, A., Alizadeh, M., Balakrishnan, H., "CCmatic: Verified Formal Model Checking for Congestion Control," HotNets, 2024.
  2. Arun, V., Alizadeh, M., Balakrishnan, H., "Starvation in End-to-End Congestion Control," SIGCOMM, 2022. doi:10.1145/3544216.3544223
  3. Choi, J. C., et al., "Ratemon: Runtime Verification of Congestion Control Contracts in the Linux Kernel," 2024.
  4. Cardwell, N., Swett, I., Beshay, M., et al., "BBRv3: Algorithm Bug Fixes and Public Internet Deployment," IETF CCWG, 2024.