One hardware-verification task, followed from spec intake to shipped bundle. Phase -2 refuses three malformed specs at the door without invoking a generation model; the real task traverses the fleet, EBMC[1] proves the seven SystemVerilog assertions in 0.278 s, and the Lean layer catches a misstated fairness theorem that bounded model-checking could not catch.
An arbiter is the part of a chip that decides who goes next when two things want the same resource at once. Two parts of a processor both want to write to memory on the same clock tick; the arbiter picks one, queues the other, and tells the losing side to wait. Every chip that shares a resource has at least one. Phones, laptops, cars, medical devices all depend on arbiters being right under every input pattern they will ever see.
Arbiter bugs are hard to catch because they rarely show up in ordinary traffic. The broken behavior only surfaces under specific patterns: a request arriving exactly when the lock releases, two requesters starving each other indefinitely, or the round-robin tiebreaker picking wrong after a reset. Running every possible pattern through a simulator is infeasible; there are billions, and an engineer runs a few thousand. The standard fix is formal verification, which asks a SAT solver to check mathematically that every possible input satisfies a fixed set of assertions the engineer wrote down.
Formal verification works, provided the assertions are right. Today's pipelines do not validate the assertions themselves. A malformed spec, a pair of contradictory properties, or a vacuously-true clause will either crash the solver with an obscure error or silently pass something that cannot be manufactured correctly. A human reviewer eventually notices, but only after the verification pipeline has run, often after a generation model has been burned on patching a chip against a broken spec.
Kairos adds a spec-validation gate before any generation model is called. Three rule-based checks run in milliseconds: parseable SystemVerilog or NuSMV properties, expected property count, and contradictory-pair detection. Only specs that pass proceed to the full pipeline. This walkthrough shows the gate catching three malformed specs, then works through the fix_arb_lock task end-to-end from a well-formed spec, demonstrating what a clean verification run looks like.
Before any generation model is invoked, Kairos runs a rule-based spec validator. It checks five things: required config fields, parseable SV or NuSMV properties, property count against what the config declares, undeclared signal references, and contradictory property pairs. If any check fails, the run exits with a rejection verdict and a suggested fix. No generation model is ever called.
Three intentionally malformed specs are run through the validator as a baseline. Every rejection is rule-based and completes in milliseconds, without touching a generation model. Each case below shows the exact spec the customer submitted and the exact message Kairos returns, unedited.
The customer submits a task config JSON that lacks the category and difficulty fields required by Kairos's task-config convention.
// customer-submitted config.json
{
"task_id": "bad_missing_fields",
"task_type": "fix_broken",
"description": "customer-submitted: asserts should check arbiter mutex and progress",
"bound": 10,
"timeout_seconds": 120
}
Kairos returns this verdict:
{
"ok": false,
"reasons": [
"task config missing required fields: category, difficulty"
],
"suggestions": [
"add the missing fields to the task config JSON (required by
athanor-builder convention): category, difficulty"
],
"parsed_property_count": 7,
"checks_run": [
"required_config_fields", "property_parse",
"expected_property_count", "undeclared_signal_refs",
"contradictory_pair_scan"
]
}
The customer's config claims the source file contains two formal properties. The source file itself contains no parseable assert property clause; what is there is syntactically malformed.
// customer-submitted student.sv module arb_lock(input clk, input reset, input req0, input req1); reg grant0, grant1; always @(posedge clk) begin if (reset) begin grant0 <= 0; grant1 <= 0; end end // malformed: missing `property ... endproperty` wrapper; // invalid operator sequence; unclosed paren; gibberish ops assert p_reset: (reset -> -> grant0 == 0 == grant1); assert property (@(posedge clk) (grant0 && grant1 !&& !grant0); endmodule
Kairos returns this verdict:
{
"ok": false,
"reasons": [
"zero properties extracted, parser found no `assert property`,
`assert final`, or NuSMV LTLSPEC/CTLSPEC/INVARSPEC in the source(s)",
"property count mismatch: config expects 2, parser found 0"
],
"suggestions": [
"add at least one labelled `assert property (...)` clause, or
ensure the source file is syntactically well-formed SV/SMV",
"either set num_expected_properties=0 in the config, or adjust
the source to define exactly 2 formal properties"
],
"parsed_property_count": 0,
"checks_run": [
"required_config_fields", "property_parse",
"expected_property_count", "undeclared_signal_refs",
"contradictory_pair_scan"
]
}
The customer's source file declares four properties. Two of them assert contradictory statements about the same signals: p_mutex says the grant lines are never simultaneously high, and p_not_mutex says they are. No module can satisfy both.
// customer-submitted student.sv (excerpt) p_grant_is_always_one: assert property (@(posedge clk) disable iff (reset) (grant0 || grant1)); p_grant_is_always_zero: assert property (@(posedge clk) disable iff (reset) (!grant0 && !grant1)); p_mutex: assert property (@(posedge clk) disable iff (reset) !(grant0 && grant1)); p_not_mutex: assert property (@(posedge clk) disable iff (reset) (grant0 && grant1));
Kairos returns this verdict:
{
"ok": false,
"reasons": [
"contradictory assertions: `p_mutex` body
`@(posedge clk) disable iff (reset) !(grant0 && grant1)` vs
`p_not_mutex` body `@(posedge clk) disable iff (reset)
(grant0 && grant1)` differ only by negation"
],
"suggestions": [
"remove one of `p_mutex` / `p_not_mutex`, or split the property
into two disjoint preconditions if both are intended to hold
under different circumstances"
],
"parsed_property_count": 4,
"checks_run": [
"required_config_fields", "property_parse",
"expected_property_count", "undeclared_signal_refs",
"contradictory_pair_scan"
]
}
Combined validator wall time across the three cases: under 200 ms. The gate makes a cheap observation: this submission is not well-formed, re-submit with the missing fields or the parseable property block or the contradiction removed. No generation model is asked to do work on a spec that is not ready for it.
The remaining walkthrough follows a single well-formed task through the full ten-phase fleet. The task:
assert property clauses the repaired module must satisfy, and a Lean 4[2] companion file stating five long-horizon theorems about the arbiter's behaviour.
These are the assertions the fix has to satisfy under bounded model-checking. They are fixed by the task; the agent does not author them.
// transition-level safety block; fixed by the task p_reset: assert property (@(posedge clk) reset |=> !grant0 && !grant1 && !locked); p_mutex: assert property (@(posedge clk) !(grant0 && grant1)); p_lock_hold: assert property (@(posedge clk) !reset && locked && burst_cnt > 3'd0 && lock_owner == 1'b0 && req0 |=> grant0 && locked); p_lock_end: assert property (@(posedge clk) !reset && locked && burst_cnt == 3'd0 |=> !locked); p_burst_load: assert property (@(posedge clk) !reset && !locked && grant0 && lock_req |=> locked && burst_cnt == $past(burst_len)); p_rr_turn: assert property (@(posedge clk) !reset && !locked && req0 && req1 && !pri0 && !pri1 && turn == 1'b0 |=> grant0); p_starve_pri: assert property (@(posedge clk) !reset && !locked && req0 && req1 && pri0 && !pri1 |=> grant0);
Each is a per-transition implication guarded by a clock edge. None of them quantify over more than one step. That matters later.
The task also ships a Lean 4 file stating five theorems about the arbiter's long-horizon behaviour: a step-monotonicity bound on the starvation counter, an idle bound for non-requesting channels, a bounded_starvation claim that no channel stays starved indefinitely, a lock_sticky invariant for bursts, and a fair_alternation statement about the turn bit's long-run parity. All theorem bodies start as sorry; the fleet has to close them.
Three model builders are issued the broken SystemVerilog and the seven assertions in parallel. Each returns a repaired module. All three repaired modules pass EBMC against the seven assertions; a winning submission is selected for the downstream phases.
The winning submission's patched arbiter body (showing the three fixes inline; full file shipped in the bundle):
// Priority escalation: starved if denied 4+ cycles wire pri0 = (starve0 >= 3'd4); wire pri1 = (starve1 >= 3'd4); always @(posedge clk) begin if (reset) begin grant0 <= 0; grant1 <= 0; locked <= 0; lock_owner <= 0; turn <= 0; burst_cnt <= 0; starve0 <= 0; starve1 <= 0; end else begin grant0 <= 1'b0; grant1 <= 1'b0; if (locked) begin // Fix 1: check burst_cnt before decrement; release at 0 if (burst_cnt == 3'd0) begin locked <= 1'b0; end else begin burst_cnt <= burst_cnt - 3'd1; if (lock_owner == 1'b0) grant0 <= req0; else grant1 <= req1; end // starvation accounting during lock ... end else begin // Normal arbitration with priority escalation if (req0 && req1) begin if (pri0 && !pri1) begin grant0 <= 1'b1; turn <= 1'b1; starve0 <= 0; starve1 <= starve1 + 3'd1; end else if (pri1 && !pri0) begin grant1 <= 1'b1; turn <= 1'b0; starve1 <= 0; starve0 <= starve0 + 3'd1; end else if (turn == 1'b0) begin // Fix 2: compare turn to 0, not 1 grant0 <= 1'b1; turn <= 1'b1; starve0 <= 0; starve1 <= starve1 + 3'd1; end else begin grant1 <= 1'b1; turn <= 1'b0; starve1 <= 0; starve0 <= starve0 + 3'd1; end end // single-requester and lock-acquisition paths (Fix 3: $past-correct burst load) ... end end end
EBMC runs against the winning SystemVerilog with the seven assertions at bound 10. All seven return PROVED. Structured result:
{
"exit_code": 0,
"proved": 7,
"refuted": 0,
"timed_out": false,
"elapsed_sec": 0.278,
"property_verdicts": {
"arb_lock.p_reset": "PROVED",
"arb_lock.p_mutex": "PROVED",
"arb_lock.p_lock_hold": "PROVED",
"arb_lock.p_lock_end": "PROVED",
"arb_lock.p_burst_load": "PROVED",
"arb_lock.p_rr_turn": "PROVED",
"arb_lock.p_starve_pri": "PROVED"
},
"proved_fraction": 1.0
}
EBMC transcript excerpt, unedited:
Converting Type-checking Verilog::arb_lock Synthesis Verilog::arb_lock Generating Decision Problem Using MiniSAT 2.2.1 with simplifier Properties Solving with propositional reduction SAT checker: instance is UNSATISFIABLE UNSAT: No path found within bound ** Results: [arb_lock.p_reset] always (reset |=> !grant0 && !grant1 && !locked): PROVED up to bound 10 [arb_lock.p_mutex] always !(grant0 && grant1): PROVED up to bound 10 [arb_lock.p_lock_hold] always (!reset && locked && burst_cnt > 0 && lock_owner == 0 && req0 |=> grant0 && locked): PROVED up to bound 10 [arb_lock.p_lock_end] always (!reset && locked && burst_cnt == 0 |=> !locked): PROVED up to bound 10 [arb_lock.p_burst_load] always (!reset && !locked && grant0 && lock_req |=> locked && burst_cnt == $past(burst_len)): PROVED up to bound 10 [arb_lock.p_rr_turn] always (!reset && !locked && req0 && req1 && !pri0 && !pri1 && turn == 0 |=> grant0): PROVED up to bound 10 [arb_lock.p_starve_pri] always (!reset && !locked && req0 && req1 && pri0 && !pri1 |=> grant0): PROVED up to bound 10
The SystemVerilog half is discharged. The winning module satisfies every declared assertion at bound 10 in 0.278 seconds of solver time. In the hardware-verification benchmark report this puts two of the five single-model baselines at 1.000 on this task; the fleet reproduces that result deterministically and moves on to the Lean half.
The Lean companion file ships with five theorem statements and sorry for every body. Excerpt (the fifth theorem is the one that matters later):
theorem starve0_step_mono (s : ArbState) (i : ArbInput) : (step s i).starve0 ≤ s.starve0 + 1 := by sorry theorem starve0_idle_bound (s : ArbState) (i : ArbInput) (n : Nat) (h : i.req0 = false) : (iterStep s i n).starve0 ≤ s.starve0 := by sorry theorem bounded_starvation (s : ArbState) (i : ArbInput) : ∃ n : Nat, n ≤ 4 ∧ ((iterStep s i n).starve0 = 0 ∨ (iterStep s i n).pri0 = true) := by sorry theorem lock_sticky (s : ArbState) (i : ArbInput) (hl : s.locked = true) (hb : s.burst_cnt > 0) : let s' := step s i s'.locked = true ∧ s'.turn = s.turn := by sorry -- The bug lives here. theorem fair_alternation (k : Nat) : let s₀ : ArbState := { starve0 := 0, starve1 := 0, turn := false, locked := false, burst_cnt := 0 } let i := ({ req0 := true, req1 := true, lock_req := false, burst_len := 0 } : ArbInput) k ≥ 1 → ∃ _n : Nat, (iterStep s₀ i (2 * k)).turn = ((k % 2 = 0) : Bool) := by sorry
The Phase 4.5 gate forbids closing any of these with sorry, admit, axiom, native_decide, decide, omega, linarith, simp, simp_all, tauto, by_contra, or ring. Closure must come from the allowed tactic set (rfl, unfold, rw, exact, cases, induction, subst, obtain, refine, show, have, match, constructor, intro, and grind).
The inline Phase 4.5 attempt does not close every theorem on the first pass, so the obligations are handed to a separate external proof-search layer. It returns with four clean closures and one refusal: fair_alternation as originally stated cannot be proved, and the returned reason is a counterexample at k = 2.
Summarised from the external proof-search report, lightly edited for prose flow:
LeanGateScratchProofs module, because the proof tactics behave differently when definitions are compiled versus inline.
starve0_step_mono. The starvation counter increases by at most 1 per step. Proved by unfolding step and using grind.
starve0_idle_bound. When channel 0 is not requesting, its starvation counter never increases. Proved by induction on n, using a helper step_starve0_le.
bounded_starvation. Within at most two steps (counter value ≤ 4), either the priority flag triggers or starvation resets to 0. Proved by case analysis on the starvation counter value, request lines, priority flags, and turn bit, with step-computation helpers.
lock_sticky. While locked with positive burst count, the turn and lock state are preserved. Proved by unfolding step and using grind.
fair_alternation. The original statement was false: counterexample at k = 2, turn is always false after even steps, not alternating with k's parity. The original statement is commented out with an explanation. A corrected version was proved: after 2k steps with symmetric contention and no locking, the turn is always false. The proof uses a period-2 argument: the state after two steps is a fixed point s₂, and step(step(s₂)) = s₂.
The returned file retains the original fair_alternation as a commented block with the reason-for-falsity attached, and states + proves the corrected version:
-- The original statement is false for all k. The RHS ((k % 2 = 0) : Bool) alternates -- but the arbiter enters a period-2 fixed point at s2 after two steps. Counterexample: -- k = 2 gives turn = false while the RHS predicts true. private def s2 : ArbState := { starve0 := 1, starve1 := 0, turn := false, locked := false, burst_cnt := 0 } private theorem iterStep_s2_even (m : Nat) : let i : ArbInput := { req0 := true, req1 := true, lock_req := false, burst_len := 0 } iterStep s2 i (2 * m) = s2 := by induction m with | zero => rfl | succ m' ih => show iterStep s2 _ (2 * (m' + 1)) = s2 rw [show 2 * (m' + 1) = 2 + 2 * m' from by grind, iterStep_add] rw [show iterStep s2 _ 2 = s2 from rfl] exact ih theorem fair_alternation (k : Nat) : let s₀ : ArbState := { starve0 := 0, starve1 := 0, turn := false, locked := false, burst_cnt := 0 } let i := ({ req0 := true, req1 := true, lock_req := false, burst_len := 0 } : ArbInput) k ≥ 1 → ∃ _n : Nat, (iterStep s₀ i (2 * k)).turn = false := by intro s₀ i hk refine ⟨0, ?_⟩ obtain ⟨k', rfl⟩ : ∃ k', k = k' + 1 := ⟨k - 1, by grind⟩ have h2k : 2 * (k' + 1) = 2 + 2 * k' := by grind rw [h2k, iterStep_add] rw [show iterStep s₀ i 2 = s2 from rfl] rw [iterStep_s2_even] rfl
Full lake build exit code 0. Final axioms: {propext, Classical.choice, Quot.sound} only. No banned tactic appears in any closure. The Lean half of the task is discharged.
A fresh stub containing the corrected fair_alternation statement and pure-sorry bodies was resubmitted to external proof search to confirm the corrected theorem admits closure independently of the inline-replacement path. The proof-search run returned with all five theorems closed from scratch.
All five theorems proved without using any banned tactics. 1. starve0_step_mono - unfoldstep;grind2. starve0_idle_bound - induction onn; helper viagrind +locals3. bounded_starvation - destructure state+input, case-split onreq1,Nat.ble 4 s0/s1,turn; explicit witness forn(1 or 2); per-caseunfold step hasPriority; rw [...]; rfl4. lock_sticky - unfoldstep;grind +splitIndPred5. fair_alternation (corrected) - 2-step cycle:s₂ = ⟨1,0,false,false,0⟩satisfiesiterStep s₀ i 2 = s₂anditerStep s₂ i 2 = s₂(both byrfl);iterStep_addcomposition lemma enables induction,iterStep s₀ i (2*(k+1)) = s₂for allk. lake build: exit 0, 471 ms. Axioms:{propext, Classical.choice, Quot.sound};bounded_starvationandfair_alternationprove without any axioms.
The corrected specification is closable from a pure-sorry stub, not only from the inline-replacement path. The round trip is complete.
ebmc reverification recipe. A reviewer does not have to trust the pipeline; they re-run EBMC against the module and re-run lake build against the Lean source.
The full ten-phase solve on fix_arb_lock completes in about 5 minutes of wall time. For the aggregate picture across all 26 tasks of the hardware-verification benchmark, see the benchmark report.