what Kairos does

LLMs propose RTL optimizations. Formal tools prove or refute equivalence. Every accepted change is machine-checked.

kairos optimize block.sv runs a multi-agent loop: the proposer generates candidates, formal engines prove functional equivalence, synthesis tools measure the real area delta, and adversarial reviewers challenge the result. Combinational and sequential designs are supported, with flop-changing transforms requiring two independent sequential equivalence proofs.

Certificates record measurement conditions (synthesis recipe, tool versions, generic vs liberty-mapped) so results are reproducible. Lean 4 proofs via Pythia provide the theorem-proving foundation.

install
pip install athanor-sdk && kairos setup && kairos doctor

Honest about what is proved. Kairos never labels testing as proof. Per-property verdicts report PROVED, REFUTED, or INCONCLUSIVE. Accepted optimizations carry formal equivalence proof and measured area reduction. Inconclusive results stay inconclusive.

verification engine

Kairos: multi-agent verification.
Proved, not claimed.

AI-generated RTL optimizations can claim area reduction but cannot guarantee functional equivalence. A single model call that says "12% smaller" has no way to prove it did not break the design. Kairos closes that gap: formal verification tools prove equivalence, synthesis tools measure the delta, and adversarial agents challenge every proposal before it ships. What ships is what the formal tools proved correct.

You give Kairos an RTL design. Kairos returns verified optimizations with formal proof, measured area reduction, and reproducible conditions.

Input

A natural‑language claim, its formal statement, and a paradigm (paper, dataset, or instrumented simulator) it must hold against.

example
“Dopamine assigns reward to actions within a finite, calibratable temporal window τc. For TD(λ): τc = −Δt / log(γλ). Check against Tang et al. 2024[1] Fig. 3c (n = 55).”
Output

A verdict per verification layer, a sandwich bound on the key parameter, and a Lean proof that either closes the theorem or localises the counterexample.

example
supported  2.70 s ≤ τc ≤ 2.82 s (4.4% gap). Mutation kill 12/12 at the Lean layer; 3 competing candidate rules refuted.
The proof Kairos ships back, in Lean 4:
theorem credit_window_is_one_over_e_decay
    (γ λ Δt : ℝ) (hgl_pos : 0 < γ * λ)
    (hgl_lt : γ * λ < 1) (hΔt : 0 < Δt) :
    Real.rpow (γ * λ) (creditWindowTau γ λ Δt / Δt)
    = Real.exp (−1)

In plain English: if the decay rate γλ lies strictly between 0 and 1, then exactly τc seconds after a reward, the eligibility signal has dropped to 1/e of its peak. No free parameters; the credit window is forced by γ, λ, and the simulator step Δt alone. The Tang 2024[1] data lives inside that bound, so the hypothesis survives this paper. Candidate rules that predict a wider or narrower window are rejected.

Inspect the paradigm below

One concrete hypothesis, unpacked. The sliders are the paper's knobs, the formula is the relationship between them, the conditions below are the rules the hypothesis must obey, and the four candidates are the alternative theories Kairos compares against.

Paradigm under verification
A credit‑window hypothesis: dopamine assigns reward to actions within a finite, calibratable temporal interval.
Tang et al. (2024), Fig. 3c, n = 55 action types.
credit window τc ≈ − Δt ⁄ log(γλ)

Hyperparameters τc = 2.820 s

γ 0.970
λ 0.995
Δt 0.100 s

Invariants

  • i.trace bound, eligibility decay bounded by γλ
  • ii.kernel symmetry, off-state updates identically zero
  • iii.Robbins–Monro, step-size schedule αk summable
Four learning rules to compare
f₁TD + eligibilityλ
f₂APEα
f₃Markowitzμ,σ
f₄TMRLβ
The scientific truth engine what no-one else can do

A real dopaminergic-learning paradigm is a multi-month, multi-mouse instrumented-wet-lab operation: Tang et al.\ 2024[1] trained 25 mice over three days; Greenstreet et al.[3] ran approximately 80. A single-shot frontier-model agent writes code but cannot close a Lean proof. A team of agents without a formal-verification layer still cannot calibrate against the paradigm’s own public data. A pure theorem prover takes a prose claim but cannot replay the paradigm. Kairos is the only apparatus that satisfies every capability the task requires and completes in minutes.

16 min vs. 3 months and 25 mice per alternative
Capability Humans
(wet‑lab)
Claude Opus 4.6 3×Opus 4.6
(orch.)
Aristotle
(Harmonic)
Kairos
Accepts natural-language hypothesis
Shared State type across paradigm encodings
Simulates the Tang paradigm (Layer 3)
Property tests at sampled parameters (Layer 2)
Machine-checked Lean 4 proof (Layer 1)
AST mutation catalog (Python + Lean)
Calibrates against Tang and Sousa public data
Spans four paradigm families in one framework
Completes pre-publication
Wall-clock3 months
10–80 mice
4 min19 min30 min – 2 h16 min

How to read. Each row is one capability the Tang 2024[1] credit-assignment paradigm actually demands. A ✓ means the alternative satisfies it; ✗ means it does not. Aristotle is Kairos’s independent cross-check arm on the hardest theorems. Kairos does not depend on it and completes every main-text claim without it.

What Kairos does not do
Verification modalities typed artifacts per candidate rule

Each row is one way Kairos can check a candidate rule. The five run in parallel; a rule has to pass every applicable row to survive.

i
Algebraic verification
theorem proving
ii
Property-based testing
randomized invariants
iii
Paradigm-replay simulation
closed-loop replay
iv
Mutation analysis
fault-injection sensitivity
v
Empirical calibration
data-driven fit
TD(λ)
eligibility + annealing
APE
advantage-based
Markowitz
mean–variance
TMRL
temporal-mismatch
Oversight
vi · artifact review
caughtM02, M04, M06
caughtM01, M05
missedM07
n.a.:
Evidence layer Lean, tests, traces

For each candidate rule, Kairos ships a typed artifact per modality. Pick a tab to see the Lean theorem, property-test outputs, simulation trace, or mutation coverage for that rule.

Lean
Property
Replay
Mutations
Calibration
Lean 4 · td0_cannot_produce_temporal_window ◆ verified
-- off-state updates are identically zero
theorem td0_cannot_produce_temporal_window
    (α γ : ℝ) (V : ValueFn) (s s' : State) (r : Reward) :
    ∀ x : State, x  s 
      tdUpdate α γ V s r s' x = V x := by
  intro x hx
  exact td0_one_step_support α γ V s s' r x hx
closed-loop replay · Tang 2024 Fig. 3c err 4.4%
Reference in dashed grey; Kairos replay in amber. Drag γ/λ sliders to re-fit.
Mutation coverage, TD(λ), 7×3 Sim · Prop · Lean
Kill rates, Sim 9/12, Property 5/12, Lean 12/12 applicable.
Click a mutation to trace its layer kill.
γ (discount)0.970
λ (eligibility)0.995
Δt (sim step)0.100 s
τc analytic2.820 s
τc empirical (lower)2.70 s
sandwich gap4.4%
seedsn = 10 000
held-out MSE0.0134
Fit is the posterior mean over a 10k-seed sweep with Robbins–Monro αk. Parameters flow from the sliders in section a.
Sandwich bound on τc empirical ↔ analytic

Two bounds on the key parameter, plotted together. The left line is the lower bound derived from experimental data; the right line is the upper bound derived from the analytic formula. A rule is consistent with the data only if the real value lives inside that band. Kairos reports the gap between the two bounds.

credit-window (empirical vs analytic) · τc [s]
Verdict per-layer binary

The per-layer outcome, collapsed to one line. Refuted means at least one verification layer caught a contradiction. Supported means every applicable layer passed and the sandwich bound held, which is the strongest statement Kairos can make without claiming absolute certainty.

Hypothesis outcome
TD(0) formally refuted (Thm. I1c): off-state updates identically zero.
TD + eligibility + annealing reproduces Tang 2024 within 4.4%.
Refuted
TD(0)
kernel-symmetry fails
Supported
TD(λ)
γ=0.970, λ=0.995

References

  1. Tang, J., Jimenez, C. E., Chakraborty, S., et al., "Dynamic action selection by dopamine signalling," Nature, 2024. doi:10.1038/s41586-024-07528-4
  2. Markowitz, J. E., Gillis, W. F., Jay, M., et al., "Spontaneous behaviour is structured by reinforcement without explicit reward," Nature 614, 108-117, 2023. doi:10.1038/s41586-022-05611-2
  3. Greenstreet, F., Vergara, H. O., Johansson, Y., et al., "Dopamine neurons evaluate actions using value-free action-prediction errors," Nature, 2025.
  4. Sousa, M., Bujalski, P., Cruz, B. F., et al., "Time-magnitude reinforcement learning in dopamine neurons," Nature, 2025.