Solve walkthrough · April 2026

fix_arb_lock: end-to-end solve walkthrough

One hardware-verification task, followed from spec intake to shipped bundle. Phase -2 refuses three malformed specs at the door without invoking a generation model; the real task traverses the fleet, EBMC^[1] proves the seven SystemVerilog assertions in 0.278 s, and the Lean layer catches a misstated fairness theorem that bounded model-checking could not catch.

Abstract

An arbiter is the part of a chip that decides who goes next when two things want the same resource at once. Two parts of a processor both want to write to memory on the same clock tick; the arbiter picks one, queues the other, and tells the losing side to wait. Every chip that shares a resource has at least one. Phones, laptops, cars, medical devices all depend on arbiters being right under every input pattern they will ever see.

Arbiter bugs are hard to catch because they rarely show up in ordinary traffic. The broken behavior only surfaces under specific patterns: a request arriving exactly when the lock releases, two requesters starving each other indefinitely, or the round-robin tiebreaker picking wrong after a reset. Running every possible pattern through a simulator is infeasible; there are billions, and an engineer runs a few thousand. The standard fix is formal verification, which asks a SAT solver to check mathematically that every possible input satisfies a fixed set of assertions the engineer wrote down.

Formal verification works, provided the assertions are right. Today's pipelines do not validate the assertions themselves. A malformed spec, a pair of contradictory properties, or a vacuously-true clause will either crash the solver with an obscure error or silently pass something that cannot be manufactured correctly. A human reviewer eventually notices, but only after the verification pipeline has run, often after a generation model has been burned on patching a chip against a broken spec.

Kairos adds a spec-validation gate before any generation model is called. Three rule-based checks run in milliseconds: parseable SystemVerilog or NuSMV properties, expected property count, and contradictory-pair detection. Only specs that pass proceed to the full pipeline. This walkthrough shows the gate catching three malformed specs, then works through the fix_arb_lock task end-to-end from a well-formed spec, demonstrating what a clean verification run looks like.

1. The spec gate

Before any generation model is invoked, Kairos runs a rule-based spec validator. It checks five things: required config fields, parseable SV or NuSMV properties, property count against what the config declares, undeclared signal references, and contradictory property pairs. If any check fails, the run exits with a rejection verdict and a suggested fix. No generation model is ever called.

Three intentionally malformed specs are run through the validator as a baseline. Every rejection is rule-based and completes in milliseconds, without touching a generation model. Each case below shows the exact spec the customer submitted and the exact message Kairos returns, unedited.

Case 1: schema gate, missing required config fields

The customer submits a task config JSON that lacks the category and difficulty fields required by Kairos's task-config convention.

// customer-submitted config.json
{
  "task_id": "bad_missing_fields",
  "task_type": "fix_broken",
  "description": "customer-submitted: asserts should check arbiter mutex and progress",
  "bound": 10,
  "timeout_seconds": 120
}

Kairos returns this verdict:

{
  "ok": false,
  "reasons": [
    "task config missing required fields: category, difficulty"
  ],
  "suggestions": [
    "add the missing fields to the task config JSON (required by
     athanor-builder convention): category, difficulty"
  ],
  "parsed_property_count": 7,
  "checks_run": [
    "required_config_fields", "property_parse",
    "expected_property_count", "undeclared_signal_refs",
    "contradictory_pair_scan"
  ]
}

Case 2: parse gate, unparseable SystemVerilog property

The customer's config claims the source file contains two formal properties. The source file itself contains no parseable assert property clause; what is there is syntactically malformed.

// customer-submitted student.sv
module arb_lock(input clk, input reset, input req0, input req1);
  reg grant0, grant1;
  always @(posedge clk) begin
    if (reset) begin grant0 <= 0; grant1 <= 0; end
  end

  // malformed: missing `property ... endproperty` wrapper;
  // invalid operator sequence; unclosed paren; gibberish ops
  assert p_reset: (reset -> -> grant0 == 0 == grant1);
  assert property (@(posedge clk) (grant0 && grant1 !&& !grant0);
endmodule

Kairos returns this verdict:

{
  "ok": false,
  "reasons": [
    "zero properties extracted, parser found no `assert property`,
     `assert final`, or NuSMV LTLSPEC/CTLSPEC/INVARSPEC in the source(s)",
    "property count mismatch: config expects 2, parser found 0"
  ],
  "suggestions": [
    "add at least one labelled `assert property (...)` clause, or
     ensure the source file is syntactically well-formed SV/SMV",
    "either set num_expected_properties=0 in the config, or adjust
     the source to define exactly 2 formal properties"
  ],
  "parsed_property_count": 0,
  "checks_run": [
    "required_config_fields", "property_parse",
    "expected_property_count", "undeclared_signal_refs",
    "contradictory_pair_scan"
  ]
}

Case 3: semantic gate, contradictory property pair

The customer's source file declares four properties. Two of them assert contradictory statements about the same signals: p_mutex says the grant lines are never simultaneously high, and p_not_mutex says they are. No module can satisfy both.

// customer-submitted student.sv (excerpt)
p_grant_is_always_one:  assert property (@(posedge clk) disable iff (reset) (grant0 || grant1));
p_grant_is_always_zero: assert property (@(posedge clk) disable iff (reset) (!grant0 && !grant1));
p_mutex:                assert property (@(posedge clk) disable iff (reset) !(grant0 && grant1));
p_not_mutex:            assert property (@(posedge clk) disable iff (reset) (grant0 && grant1));

Kairos returns this verdict:

{
  "ok": false,
  "reasons": [
    "contradictory assertions: `p_mutex` body
     `@(posedge clk) disable iff (reset) !(grant0 && grant1)` vs
     `p_not_mutex` body `@(posedge clk) disable iff (reset)
     (grant0 && grant1)` differ only by negation"
  ],
  "suggestions": [
    "remove one of `p_mutex` / `p_not_mutex`, or split the property
     into two disjoint preconditions if both are intended to hold
     under different circumstances"
  ],
  "parsed_property_count": 4,
  "checks_run": [
    "required_config_fields", "property_parse",
    "expected_property_count", "undeclared_signal_refs",
    "contradictory_pair_scan"
  ]
}

Combined validator wall time across the three cases: under 200 ms. The gate makes a cheap observation: this submission is not well-formed, re-submit with the missing fields or the parseable property block or the contradiction removed. No generation model is asked to do work on a spec that is not ready for it.

2. A well-formed spec: fix_arb_lock

The remaining walkthrough follows a single well-formed task through the full ten-phase fleet. The task:

fix_arb_lock. A 2-channel bus arbiter with lock acquisition, a burst counter, and priority escalation when a requester is starved for four or more consecutive cycles. Three intentional bugs corrupt the lock-release condition, the burst-count loading path, and the round-robin turn comparison. The task ships a broken SystemVerilog module, a fixed set of seven named assert property clauses the repaired module must satisfy, and a Lean 4^[2] companion file stating five long-horizon theorems about the arbiter's behaviour.

The seven SystemVerilog assertions

These are the assertions the fix has to satisfy under bounded model-checking. They are fixed by the task; the agent does not author them.

// transition-level safety block; fixed by the task
p_reset:      assert property (@(posedge clk) reset |=> !grant0 && !grant1 && !locked);
p_mutex:      assert property (@(posedge clk) !(grant0 && grant1));
p_lock_hold:  assert property (@(posedge clk) !reset && locked && burst_cnt > 3'd0 && lock_owner == 1'b0 && req0 |=> grant0 && locked);
p_lock_end:   assert property (@(posedge clk) !reset && locked && burst_cnt == 3'd0 |=> !locked);
p_burst_load: assert property (@(posedge clk) !reset && !locked && grant0 && lock_req |=> locked && burst_cnt == $past(burst_len));
p_rr_turn:    assert property (@(posedge clk) !reset && !locked && req0 && req1 && !pri0 && !pri1 && turn == 1'b0 |=> grant0);
p_starve_pri: assert property (@(posedge clk) !reset && !locked && req0 && req1 && pri0 && !pri1 |=> grant0);

Each is a per-transition implication guarded by a clock edge. None of them quantify over more than one step. That matters later.

The five Lean theorems

The task also ships a Lean 4 file stating five theorems about the arbiter's long-horizon behaviour: a step-monotonicity bound on the starvation counter, an idle bound for non-requesting channels, a bounded_starvation claim that no channel stays starved indefinitely, a lock_sticky invariant for bursts, and a fair_alternation statement about the turn bit's long-run parity. All theorem bodies start as sorry; the fleet has to close them.

3. Hackathon

Three model builders are issued the broken SystemVerilog and the seven assertions in parallel. Each returns a repaired module. All three repaired modules pass EBMC against the seven assertions; a winning submission is selected for the downstream phases.

The winning submission's patched arbiter body (showing the three fixes inline; full file shipped in the bundle):

// Priority escalation: starved if denied 4+ cycles
wire pri0 = (starve0 >= 3'd4);
wire pri1 = (starve1 >= 3'd4);

always @(posedge clk) begin
  if (reset) begin
    grant0 <= 0; grant1 <= 0; locked <= 0; lock_owner <= 0; turn <= 0;
    burst_cnt <= 0; starve0 <= 0; starve1 <= 0;
  end else begin
    grant0 <= 1'b0;
    grant1 <= 1'b0;

    if (locked) begin
      // Fix 1: check burst_cnt before decrement; release at 0
      if (burst_cnt == 3'd0) begin
        locked <= 1'b0;
      end else begin
        burst_cnt <= burst_cnt - 3'd1;
        if (lock_owner == 1'b0) grant0 <= req0;
        else                    grant1 <= req1;
      end
      // starvation accounting during lock ...
    end else begin
      // Normal arbitration with priority escalation
      if (req0 && req1) begin
        if (pri0 && !pri1) begin
          grant0 <= 1'b1; turn <= 1'b1; starve0 <= 0; starve1 <= starve1 + 3'd1;
        end else if (pri1 && !pri0) begin
          grant1 <= 1'b1; turn <= 1'b0; starve1 <= 0; starve0 <= starve0 + 3'd1;
        end else if (turn == 1'b0) begin   // Fix 2: compare turn to 0, not 1
          grant0 <= 1'b1; turn <= 1'b1; starve0 <= 0; starve1 <= starve1 + 3'd1;
        end else begin
          grant1 <= 1'b1; turn <= 1'b0; starve1 <= 0; starve0 <= starve0 + 3'd1;
        end
      end
      // single-requester and lock-acquisition paths (Fix 3: $past-correct burst load)
      ...
    end
  end
end

4. EBMC close

EBMC runs against the winning SystemVerilog with the seven assertions at bound 10. All seven return PROVED. Structured result:

{
  "exit_code": 0,
  "proved": 7,
  "refuted": 0,
  "timed_out": false,
  "elapsed_sec": 0.278,
  "property_verdicts": {
    "arb_lock.p_reset":      "PROVED",
    "arb_lock.p_mutex":      "PROVED",
    "arb_lock.p_lock_hold":  "PROVED",
    "arb_lock.p_lock_end":   "PROVED",
    "arb_lock.p_burst_load": "PROVED",
    "arb_lock.p_rr_turn":    "PROVED",
    "arb_lock.p_starve_pri": "PROVED"
  },
  "proved_fraction": 1.0
}

EBMC transcript excerpt, unedited:

Converting
Type-checking Verilog::arb_lock
Synthesis Verilog::arb_lock
Generating Decision Problem
Using MiniSAT 2.2.1 with simplifier
Properties
Solving with propositional reduction
SAT checker: instance is UNSATISFIABLE
UNSAT: No path found within bound

** Results:
[arb_lock.p_reset]      always (reset |=> !grant0 && !grant1 && !locked): PROVED up to bound 10
[arb_lock.p_mutex]      always !(grant0 && grant1): PROVED up to bound 10
[arb_lock.p_lock_hold]  always (!reset && locked && burst_cnt > 0 && lock_owner == 0 && req0 |=> grant0 && locked): PROVED up to bound 10
[arb_lock.p_lock_end]   always (!reset && locked && burst_cnt == 0 |=> !locked): PROVED up to bound 10
[arb_lock.p_burst_load] always (!reset && !locked && grant0 && lock_req |=> locked && burst_cnt == $past(burst_len)): PROVED up to bound 10
[arb_lock.p_rr_turn]    always (!reset && !locked && req0 && req1 && !pri0 && !pri1 && turn == 0 |=> grant0): PROVED up to bound 10
[arb_lock.p_starve_pri] always (!reset && !locked && req0 && req1 && pri0 && !pri1 |=> grant0): PROVED up to bound 10

The SystemVerilog half is discharged. The winning module satisfies every declared assertion at bound 10 in 0.278 seconds of solver time. In the hardware-verification benchmark report this puts two of the five single-model baselines at 1.000 on this task; the fleet reproduces that result deterministically and moves on to the Lean half.

5. Lean gate, original stub

The Lean companion file ships with five theorem statements and sorry for every body. Excerpt (the fifth theorem is the one that matters later):

theorem starve0_step_mono (s : ArbState) (i : ArbInput) :
    (step s i).starve0 ≤ s.starve0 + 1 := by sorry

theorem starve0_idle_bound (s : ArbState) (i : ArbInput) (n : Nat)
    (h : i.req0 = false) :
    (iterStep s i n).starve0 ≤ s.starve0 := by sorry

theorem bounded_starvation (s : ArbState) (i : ArbInput) :
    ∃ n : Nat, n ≤ 4 ∧
      ((iterStep s i n).starve0 = 0 ∨ (iterStep s i n).pri0 = true) := by sorry

theorem lock_sticky (s : ArbState) (i : ArbInput)
    (hl : s.locked = true) (hb : s.burst_cnt > 0) :
    let s' := step s i
    s'.locked = true ∧ s'.turn = s.turn := by sorry

-- The bug lives here.
theorem fair_alternation (k : Nat) :
    let s₀ : ArbState := { starve0 := 0, starve1 := 0, turn := false, locked := false, burst_cnt := 0 }
    let i := ({ req0 := true, req1 := true, lock_req := false, burst_len := 0 } : ArbInput)
    k ≥ 1 →
    ∃ _n : Nat, (iterStep s₀ i (2 * k)).turn = ((k % 2 = 0) : Bool) := by sorry

The Phase 4.5 gate forbids closing any of these with sorry, admit, axiom, native_decide, decide, omega, linarith, simp, simp_all, tauto, by_contra, or ring. Closure must come from the allowed tactic set (rfl, unfold, rw, exact, cases, induction, subst, obtain, refine, show, have, match, constructor, intro, and grind).

6. External proof search engages

The inline Phase 4.5 attempt does not close every theorem on the first pass, so the obligations are handed to a separate external proof-search layer. It returns with four clean closures and one refusal: fair_alternation as originally stated cannot be proved, and the returned reason is a counterexample at k = 2.

Initial state s₀: starve0 = 0, starve1 = 0, turn = false, locked = false, burst_cnt = 0. Input every cycle: req0 = true, req1 = true, no lock, no burst.

Step 1

Both request. Neither is starved (counters are 0, threshold is 4). Round-robin tie-break picks turn = false, so channel 0 wins. starve0 resets to 0, starve1 becomes 1, turn flips to true.

Step 2

Both request. No priority. turn = true, so channel 1 wins. starve1 resets to 0, starve0 becomes 1, turn flips back to false. The state after two steps: starve0 = 1, starve1 = 0, turn = false. Call this s₂.

Step 3 (from `s₂`)

Both request. Neither starved (1 < 4). turn = false, channel 0 wins. The update lands on exactly the state at the end of step 1.

Step 4 (from that state)

Identical dynamics to step 2. Ends at s₂ again.

After four steps, turn = false. The original theorem predicted turn = true for k = 2 (the parity of 2 is even). The arbiter has entered a period-2 cycle where every even step count lands at s₂, so turn stays false modulo 2. The original theorem is false.

The seven SystemVerilog assertions do not catch this. p_rr_turn is a per-transition implication: if turn == 0 and both channels request with no priority, the next cycle grants channel 0. True. p_starve_pri is another per-transition rule. None of them quantifies over 2k steps or makes a claim about the long-run parity of turn. EBMC proved every property it was handed; the misstatement sat in the Lean specification alone, and the Lean layer is where it was caught.

Proof-search writeup

Summarised from the external proof-search report, lightly edited for prose flow:

All five theorems from the staged Lean file have been proved without using any banned tactics. The definitions were kept in the original file and all proofs were placed in a companion LeanGateScratchProofs module, because the proof tactics behave differently when definitions are compiled versus inline.

starve0_step_mono. The starvation counter increases by at most 1 per step. Proved by unfolding step and using grind.
starve0_idle_bound. When channel 0 is not requesting, its starvation counter never increases. Proved by induction on n, using a helper step_starve0_le.
bounded_starvation. Within at most two steps (counter value ≤ 4), either the priority flag triggers or starvation resets to 0. Proved by case analysis on the starvation counter value, request lines, priority flags, and turn bit, with step-computation helpers.
lock_sticky. While locked with positive burst count, the turn and lock state are preserved. Proved by unfolding step and using grind.
fair_alternation. The original statement was false: counterexample at k = 2, turn is always false after even steps, not alternating with k's parity. The original statement is commented out with an explanation. A corrected version was proved: after 2k steps with symmetric contention and no locking, the turn is always false. The proof uses a period-2 argument: the state after two steps is a fixed point s₂, and step(step(s₂)) = s₂.

7. The corrected theorem, proved

The returned file retains the original fair_alternation as a commented block with the reason-for-falsity attached, and states + proves the corrected version:

-- The original statement is false for all k. The RHS ((k % 2 = 0) : Bool) alternates
-- but the arbiter enters a period-2 fixed point at s2 after two steps. Counterexample:
-- k = 2 gives turn = false while the RHS predicts true.

private def s2 : ArbState :=
  { starve0 := 1, starve1 := 0, turn := false, locked := false, burst_cnt := 0 }

private theorem iterStep_s2_even (m : Nat) :
    let i : ArbInput := { req0 := true, req1 := true, lock_req := false, burst_len := 0 }
    iterStep s2 i (2 * m) = s2 := by
  induction m with
  | zero => rfl
  | succ m' ih =>
    show iterStep s2 _ (2 * (m' + 1)) = s2
    rw [show 2 * (m' + 1) = 2 + 2 * m' from by grind, iterStep_add]
    rw [show iterStep s2 _ 2 = s2 from rfl]
    exact ih

theorem fair_alternation (k : Nat) :
    let s₀ : ArbState := { starve0 := 0, starve1 := 0, turn := false, locked := false, burst_cnt := 0 }
    let i := ({ req0 := true, req1 := true, lock_req := false, burst_len := 0 } : ArbInput)
    k ≥ 1 →
    ∃ _n : Nat, (iterStep s₀ i (2 * k)).turn = false := by
  intro s₀ i hk
  refine ⟨0, ?_⟩
  obtain ⟨k', rfl⟩ : ∃ k', k = k' + 1 := ⟨k - 1, by grind⟩
  have h2k : 2 * (k' + 1) = 2 + 2 * k' := by grind
  rw [h2k, iterStep_add]
  rw [show iterStep s₀ i 2 = s2 from rfl]
  rw [iterStep_s2_even]
  rfl

Full lake build exit code 0. Final axioms: {propext, Classical.choice, Quot.sound} only. No banned tactic appears in any closure. The Lean half of the task is discharged.

8. Final-final re-submission

A fresh stub containing the corrected fair_alternation statement and pure-sorry bodies was resubmitted to external proof search to confirm the corrected theorem admits closure independently of the inline-replacement path. The proof-search run returned with all five theorems closed from scratch.

All five theorems proved without using any banned tactics.

1. starve0_step_mono   - unfold step; grind
2. starve0_idle_bound  - induction on n; helper via grind +locals
3. bounded_starvation  - destructure state+input, case-split on req1,
                        Nat.ble 4 s0/s1, turn; explicit witness for n (1 or 2);
                        per-case unfold step hasPriority; rw [...]; rfl
4. lock_sticky         - unfold step; grind +splitIndPred
5. fair_alternation (corrected) - 2-step cycle: s₂ = ⟨1,0,false,false,0⟩
                        satisfies iterStep s₀ i 2 = s₂ and iterStep s₂ i 2 = s₂
                        (both by rfl); iterStep_add composition lemma enables
                        induction, iterStep s₀ i (2*(k+1)) = s₂ for all k.

lake build: exit 0, 471 ms.
Axioms: {propext, Classical.choice, Quot.sound}; bounded_starvation and
fair_alternation prove without any axioms.

The corrected specification is closable from a pure-sorry stub, not only from the inline-replacement path. The round trip is complete.

What the run demonstrates

Bad specs are refused at the door, before any generation model runs. Three malformed specs rejected by the Phase -2 validator in combined wall time under 200 ms, with no generation-model invocation. A production pipeline that runs a multi-agent fleet on a bad spec before realising the spec is bad is a pipeline doing avoidable work. The gate moves that determination to a rule-based check.

EBMC discharges the transition-level safety block, deterministically. Seven named assertions, 0.278 seconds, seven PROVED, bound 10. This is the half of the task that bounded model-checking is correct for. The fleet did not invent new work here; it ran the right tool against the right property set.

Lean caught a misstated specification that EBMC could not catch. The RTL is correct. An authored long-horizon claim about it was wrong. The period-2 fixed-point argument proving the corrected theorem is representable in Lean 4 and not representable as a single bounded-assertion clause in the SV property set. Neither layer is redundant on this task.

The shipped bundle is reviewer-reproducible. The Phase 6 artifact is a tarball containing the winning SystemVerilog, the EBMC verdict file, the Lean source with the closed proofs, and an ebmc reverification recipe. A reviewer does not have to trust the pipeline; they re-run EBMC against the module and re-run lake build against the Lean source.

The full ten-phase solve on fix_arb_lock completes in about 5 minutes of wall time. For the aggregate picture across all 26 tasks of the hardware-verification benchmark, see the benchmark report.

References

Kroening, D., et al., "EBMC: The Enhanced Bounded Model Checker." Diffblue. github.com/diffblue/hw-cbmc
de Moura, L., Ullrich, S., "The Lean 4 Theorem Prover and Programming Language," CADE, 2021. doi:10.1007/978-3-030-79876-5_37

fix_arb_lock: end-to-end solve walkthrough

Abstract

1. The spec gate

Case 1: schema gate, missing required config fields

Case 2: parse gate, unparseable SystemVerilog property

Case 3: semantic gate, contradictory property pair

2. A well-formed spec: fix_arb_lock

The seven SystemVerilog assertions

The five Lean theorems

3. Hackathon

4. EBMC close

5. Lean gate, original stub

6. External proof search engages

Step 1

Step 2

Step 3 (from s₂)

Step 4 (from that state)

Proof-search writeup

7. The corrected theorem, proved

8. Final-final re-submission

What the run demonstrates

References

Step 3 (from `s₂`)