what Kairos just did

Kairos refuted a 25-year-old convergence theorem at the core of dopamine-learning neuroscience, and supplied the corrected version.

The two-timescale actor‑critic convergence theorem that reinforcement-learning textbooks retell as the algorithmic ground of temporal‑difference credit assignment is false as commonly stated. The missing hypothesis is bounded feature maps, which biological feature spaces are not known to satisfy. Kairos derived the counterexample from the theorem statement itself and machine-checked the corrected theorem in Lean 4, with zero sorry and only the three standard logic axioms. An independent theorem prover (Aristotle, Harmonic.AI) arrived at the same conclusion through a different mathematical witness.

Kairos then re-analysed the public datasets of four recent Nature papers on dopaminergic learning (Tang 2024^[1], Markowitz 2023^[2], Greenstreet 2025^[3], Sousa 2025^[4]) and surfaced findings those papers did not report from their own data: the slow‑versus‑fast learner binary is a continuous population, not two clusters; per‑action credit windows vary by 48% across action types; value‑free action‑prediction error holds in 21 of 21 animals including under lesion, which rules out a single‑locus temporal‑difference account.

the natural-language prompt, verbatim

Check whether the two-timescale actor‑critic convergence theorem applies to the Tang 2024 credit-assignment paradigm. Return a verdict, a counterexample or a corrected statement, a machine-checked Lean proof, and empirical residuals against each of the four candidate rule families’ public datasets.

What this means. A textbook convergence argument that every dopamine-learning review cites has been under-specified for 25 years. The corrected theorem states the precise hypothesis biological feature maps must satisfy for that argument to apply, so papers that built on the original statement can re-ground their claims without rewriting their experiments. Labs proposing a new learning rule can now run Kairos as a pre-publication check: formal consistency, paradigm-replay calibration against the paradigm’s own data, and a distinguishability witness against competing rules, end-to-end on a laptop and without any new animal cohorts.

scientific truth engine

Kairos: the scientific truth engine.
A refutation pipeline for scientific claims.

Scientific claims about neural computation proliferate faster than the field can distinguish them. Competing learning‑rule accounts each fit their own paradigm, but no pre‑publication apparatus checks a proposed rule simultaneously for biological faithfulness, algebraic consistency, and robustness under implementation mutations. Kairos closes that gap with a machine‑checked verification stack that refutes candidate rules against their paradigm’s own public data and leaves a reviewer‑reproducible bundle of Lean proofs and paradigm‑replay traces. Refutations are certain; support is provisional; the bundle is the evidence.

You give Kairos a scientific hypothesis. Kairos returns a verdict with bounds, a machine‑checked proof, and the candidate rules the hypothesis rules out.

Input

A natural‑language claim, its formal statement, and a paradigm (paper, dataset, or instrumented simulator) it must hold against.

example

“Dopamine assigns reward to actions within a finite, calibratable temporal window τ_c. For TD(λ): τ_c = −Δt / log(γλ). Check against Tang et al. 2024^[1] Fig. 3c (n = 55).”

Output

A verdict per verification layer, a sandwich bound on the key parameter, and a Lean proof that either closes the theorem or localises the counterexample.

example

supported 2.70 s ≤ τ_c ≤ 2.82 s (4.4% gap). Mutation kill 12/12 at the Lean layer; 3 competing candidate rules refuted.

The proof Kairos ships back, in Lean 4:

theorem credit_window_is_one_over_e_decay
    (γ λ Δt : ℝ) (hgl_pos : 0 < γ * λ)
    (hgl_lt : γ * λ < 1) (hΔt : 0 < Δt) :
    Real.rpow (γ * λ) (creditWindowTau γ λ Δt / Δt)
    = Real.exp (−1)

In plain English: if the decay rate γλ lies strictly between 0 and 1, then exactly τ_c seconds after a reward, the eligibility signal has dropped to 1/e of its peak. No free parameters; the credit window is forced by γ, λ, and the simulator step Δt alone. The Tang 2024^[1] data lives inside that bound, so the hypothesis survives this paper. Candidate rules that predict a wider or narrower window are rejected.

↓ Inspect the paradigm below

One concrete hypothesis, unpacked. The sliders are the paper's knobs, the formula is the relationship between them, the conditions below are the rules the hypothesis must obey, and the four candidates are the alternative theories Kairos compares against.

Paradigm under verification

A credit‑window hypothesis: dopamine assigns reward to actions within a finite, calibratable temporal interval.

Tang et al. (2024), Fig. 3c, n = 55 action types.

credit window τ_c ≈ − Δt ⁄ log(γλ)

Hyperparameters τ_c = 2.820 s

γ 0.970

λ 0.995

Δt 0.100 s

Invariants

i.trace bound, eligibility decay bounded by γλ
ii.kernel symmetry, off-state updates identically zero
iii.Robbins–Monro, step-size schedule α_k summable

Four learning rules to compare

f₁TD + eligibilityλ

f₂APEα

f₃Markowitzμ,σ

f₄TMRLβ

The scientific truth engine what no-one else can do

A real dopaminergic-learning paradigm is a multi-month, multi-mouse instrumented-wet-lab operation: Tang et al.\ 2024^[1] trained 25 mice over three days; Greenstreet et al.^[3] ran approximately 80. A single-shot frontier-model agent writes code but cannot close a Lean proof. A team of agents without a formal-verification layer still cannot calibrate against the paradigm’s own public data. A pure theorem prover takes a prose claim but cannot replay the paradigm. Kairos is the only apparatus that satisfies every capability the task requires and completes in minutes.

16 min vs. 3 months and 25 mice per alternative

Capability	Humans (wet‑lab)	Claude Opus 4.6	3×Opus 4.6 (orch.)	Aristotle (Harmonic)	Kairos
Accepts natural-language hypothesis	✓	✓	✓	✓	✓
Shared `State` type across paradigm encodings	✗	✗	✗	✗	✓
Simulates the Tang paradigm (Layer 3)	✓	✗	✓	✗	✓
Property tests at sampled parameters (Layer 2)	✗	✗	✗	✗	✓
Machine-checked Lean 4 proof (Layer 1)	✗	✗	✓	✓	✓
AST mutation catalog (Python + Lean)	✗	✗	✗	✗	✓
Calibrates against Tang and Sousa public data	✓	✗	✗	✗	✓
Spans four paradigm families in one framework	✗	✗	✗	✗	✓
Completes pre-publication	✗	✓	✓	✓	✓
Wall-clock	3 months 10–80 mice	4 min	19 min	30 min – 2 h	16 min

How to read. Each row is one capability the Tang 2024^[1] credit-assignment paradigm actually demands. A ✓ means the alternative satisfies it; ✗ means it does not. Aristotle is Kairos’s independent cross-check arm on the hardest theorems. Kairos does not depend on it and completes every main-text claim without it.

What Kairos does not do

Support is not proof. Passing every layer means no contradiction was found within the encoded paradigm. It does not prove the hypothesis is the true rule.
Requires a formal statement. Claims that cannot be reduced to a machine-checkable theorem stay at the property and simulation layers only; the algebraic layer will be empty.
Bounded to the candidate fleet you hand it. The verdict is always “X beats {Y, Z, W} on this paradigm”, never “X is optimal globally.”
Bounds are as tight as the data. Sparse or noisy paradigm data widens the sandwich. Kairos reports the gap rather than paper over it.
Mutation coverage is not proof of soundness. A bug class the operator catalogue does not generate is a bug class Kairos cannot kill.
Infrastructure errors are not refutations. Rate limits, sandbox failures, and timeouts produce null traces. The pipeline flags these and reports ambiguity, not a verdict.

Verification modalities typed artifacts per candidate rule

Each row is one way Kairos can check a candidate rule. The five run in parallel; a rule has to pass every applicable row to survive.

Algebraic verification

theorem proving

Property-based testing

randomized invariants

iii

Paradigm-replay simulation

closed-loop replay

Mutation analysis

fault-injection sensitivity

Empirical calibration

data-driven fit

TD(λ)

eligibility + annealing

APE

advantage-based

Markowitz

mean–variance

TMRL

temporal-mismatch

Oversight

vi · artifact review

caughtM02, M04, M06

caughtM01, M05

missedM07

n.a.:

Evidence layer Lean, tests, traces

For each candidate rule, Kairos ships a typed artifact per modality. Pick a tab to see the Lean theorem, property-test outputs, simulation trace, or mutation coverage for that rule.

Lean

Property

Replay

Mutations

Calibration

Lean 4 · td0_cannot_produce_temporal_window ◆ verified

-- off-state updates are identically zero
theorem td0_cannot_produce_temporal_window
    (α γ : ℝ) (V : ValueFn) (s s' : State) (r : Reward) :
    ∀ x : State, x ≠ s →
      tdUpdate α γ V s r s' x = V x := by
  intro x hx
  exact td0_one_step_support α γ V s s' r x hx

closed-loop replay · Tang 2024 Fig. 3c err 4.4%

Reference in dashed grey; Kairos replay in amber. Drag γ/λ sliders to re-fit.

Mutation coverage, TD(λ), 7×3 Sim · Prop · Lean

Kill rates, Sim 9/12, Property 5/12, Lean 12/12 applicable.
Click a mutation to trace its layer kill.

γ (discount)0.970

λ (eligibility)0.995

Δt (sim step)0.100 s

τ_c analytic2.820 s

τ_c empirical (lower)2.70 s

sandwich gap4.4%

seedsn = 10 000

held-out MSE0.0134

Fit is the posterior mean over a 10k-seed sweep with Robbins–Monro α_k. Parameters flow from the sliders in section a.

Sandwich bound on τ_c empirical ↔ analytic

Two bounds on the key parameter, plotted together. The left line is the lower bound derived from experimental data; the right line is the upper bound derived from the analytic formula. A rule is consistent with the data only if the real value lives inside that band. Kairos reports the gap between the two bounds.

credit-window (empirical vs analytic) · τ_c [s]

Verdict per-layer binary

The per-layer outcome, collapsed to one line. Refuted means at least one verification layer caught a contradiction. Supported means every applicable layer passed and the sandwich bound held, which is the strongest statement Kairos can make without claiming absolute certainty.

Hypothesis outcome

TD(0) formally refuted (Thm. I1c): off-state updates identically zero.
TD + eligibility + annealing reproduces Tang 2024 within 4.4%.

Refuted

TD(0)

kernel-symmetry fails

Supported

TD(λ)

γ=0.970, λ=0.995

References

Tang, J., Jimenez, C. E., Chakraborty, S., et al., "Dynamic action selection by dopamine signalling," Nature, 2024. doi:10.1038/s41586-024-07528-4
Markowitz, J. E., Gillis, W. F., Jay, M., et al., "Spontaneous behaviour is structured by reinforcement without explicit reward," Nature 614, 108-117, 2023. doi:10.1038/s41586-022-05611-2
Greenstreet, F., Vergara, H. O., Johansson, Y., et al., "Dopamine neurons evaluate actions using value-free action-prediction errors," Nature, 2025.
Sousa, M., Bujalski, P., Cruz, B. F., et al., "Time-magnitude reinforcement learning in dopamine neurons," Nature, 2025.

Private preview

Kairos refuted a 25-year-old convergence theorem at the core of dopamine-learning neuroscience, and supplied the corrected version.

Kairos: the scientific truth engine.A refutation pipeline for scientific claims.

Hyperparameters τc = 2.820 s

Invariants

References

Kairos: the scientific truth engine.
A refutation pipeline for scientific claims.

Hyperparameters τ_c = 2.820 s