Applied case studies from the Kairos verification engine. Each report examines how frontier AI models perform on tasks where formal verification provides the ground truth.
One task, followed from spec intake to shipped bundle. Three malformed specs are refused at the door without invoking a generation model; EBMC discharges the seven SystemVerilog assertions in 0.278 s; the Lean layer catches a misstated fairness theorem that bounded model-checking could not catch.
26 SystemVerilog and NuSMV repair tasks scored by EBMC, with Lean 4 theorem obligations on the five tasks that require universally-quantified correctness. Model-checked properties either prove or they don't.
AI-generated accelerator kernels pass seeded property tests and still produce silently wrong output on nearby input shapes. A universally-quantified correctness theorem rejects kernels that cannot discharge it.
Authorization policies that pass unit tests still admit privilege-escalation paths. Deep dive on three Cedar tasks: an adversarial policy audit, a photo-sharing policy debug, and a Lean 4 proof of Cedar schema validation soundness.
From one 78-line protocol spec, five independent verifier backends close the same closed-form sandwich bound on BBRv3's starvation-onset time. A patched filter certifies the complementary positive theorem. One command reproduces every verdict in under 20 minutes.