Technical Reports

Applied case studies from the Kairos verification engine. Each report examines how frontier AI models perform on tasks where formal verification provides the ground truth.

Neuroscience: machine-checked refutation of a dopamine-learning convergence theorem

A 25-year-old two-timescale actor-critic convergence theorem at the core of dopamine-learning neuroscience, checked against the Tang 2024 credit-assignment paradigm. Kairos refutes the stated theorem, supplies the corrected version machine-checked in Lean 4, and sandwich-bounds the empirical time constant against four competing candidate rules.

Research April 2026

Hardware verification

3 reports

Hardware verification: end-to-end solve of a bus-arbiter liveness task

One task, followed from spec intake to shipped bundle. Three malformed specs are refused at the door without invoking a generation model; EBMC discharges the seven SystemVerilog assertions in 0.278 s; the Lean layer catches a misstated fairness theorem that bounded model-checking could not catch.

Solve walkthrough April 2026

Hardware verification: 26-task RTL correctness benchmark

26 SystemVerilog and NuSMV repair tasks scored by EBMC, with Lean 4 theorem obligations on the five tasks that require universally-quantified correctness. Model-checked properties either prove or they don't.

Benchmark April 2026

Accelerator kernel: universally-quantified correctness for AI-generated NKI kernels

AI-generated accelerator kernels pass seeded property tests and still produce silently wrong output on nearby input shapes. A universally-quantified correctness theorem rejects kernels that cannot discharge it.

Solve walkthrough April 2026

Authorization policy: three Cedar tasks where correctness beats testing

Authorization policies that pass unit tests still admit privilege-escalation paths. Deep dive on three Cedar tasks: an adversarial policy audit, a photo-sharing policy debug, and a Lean 4 proof of Cedar schema validation soundness.