Why “it works” is no longer good enough for clinical AI

Insight

May 19, 2026

Why “it works” is no longer good enough for clinical AI

Technical performance metrics tell us whether an algorithm can predict. They tell us almost nothing about whether it changes outcomes. A different kind of science is required.

A recent Nature Medicine editorial put it plainly: claims that medical AI is improving care must be backed by appropriate evidence. The problem, as the editors note, is not a lack of models — it is a lack of evaluation that connects technical performance to clinical impact.

Discrimination, calibration, sensitivity, specificity: these metrics tell us whether an algorithm can predict. They tell us almost nothing about whether it changes outcomes.

A model can score beautifully on a held-out test set and still fail to improve care if its outputs are ignored, mistimed, or disruptive to the workflows it was designed to support.

We have spent a decade building clinical AI. We are only beginning to ask whether it works in the way that matters.

The questions that now matter

Does the tool change clinician behavior?
Does that behavioral change translate into better outcomes for patients?
Are those effects real, or artifacts of secular trends and Hawthorne effects?

The answer requires a different kind of science — one grounded in causal real-world inference, estimand specification, and study designs that are proportional to the claim being made.

These are not academic questions. They are the questions that health systems, payers, and regulators will increasingly demand answers to — and importantly, that AI developers will need to answer if they want sustained adoption rather than a pilot that quietly disappears.

The era of "trust the AUROC" is over.
The era of impact evidence has begun.

About Augura

We built our platform around exactly this challenge: helping digital health companies and health systems design, execute, and communicate the causal evidence that turns a promising AI product into a credible, defensible clinical intervention.

Smiling man with light brown hair and stubble, wearing a khaki shirt in natural light

Want to learn more?