We publish the reasoning. You check the math.
White papers behind Futsu's architecture — written like science, cited from the exact claims they support, and explicit about what is published evidence versus our own argument.
Write with one agent. Verify with another. The case for generator–verifier separation in coding pipelines.
Asking the model that wrote the code to also judge it reuses the same weights, the same context, and the same blind spots. We synthesize published evidence on self-correction limits, verifier asymmetry, self-preference bias and critic models; lay out six mechanisms that make a separate reviewer-debugger effective; and ship an open protocol for testing the claim on your own repository — because every Futsu run is a folder you can grep.
Read the paperOne paper published. The evaluation program is ongoing — as opt-in early-access telemetry accrues, we publish the numbers with raw run artifacts, the same way everything else here is verifiable.