What the Error Actually Was—and Why It Matters

In early research into AI-assisted medical record review, an automated platform that several plaintiff firms had begun using was evaluated. This was before the verification protocols now applied to every case workflow had been developed. It was well-marketed, priced at a premium, and positioned as a time-saving solution for the record-heavy early stages of case evaluation. The output it produced looked authoritative: formatted cleanly, organized logically, and written in language that sounded like clinical analysis.

The platform's review summarized the clinical events, assessed the standard-of-care issues in broad strokes, and produced a damages overview that described the decedent as a young mother who had left behind two children. That detail appeared with the same confidence and formatting as every accurate finding in the same document—no flag, no qualifier, no indication that the model had inferred rather than extracted. It stated the detail as fact because the model expected it to be fact, and because nobody in the workflow had independently read the records and knew otherwise.

In a wrongful death case, children are not a biographical detail. Their existence, their ages, and their dependency on the decedent are a damages argument—they shape consortium claims, affect case valuation, and influence what a jury is asked to feel about a loss. An attorney who built an initial damages picture around two children who did not exist would eventually have discovered the error. The question is at what stage, at what cost, and after how much of the case strategy had already been built around a fact the records never supported.

The output looked like the product working correctly. The error was indistinguishable from accurate work product—until someone who had independently read the records compared the two.

Hallucination vs. Confabulation—Why the Distinction Matters

Most attorneys who have encountered AI accuracy problems have heard them described as hallucinations—the model inventing information that has no basis in reality. A fabricated case citation. A drug interaction that does not exist. A statute with the wrong number. Hallucination in that sense is dramatic and, in many contexts, relatively easy to catch because the fabricated information conflicts with something the reader can independently verify or already knows to be false.

What happened in that case was something categorically different and significantly more dangerous in a medical record review context. It was confabulation—and the distinction is worth understanding precisely because confabulation is designed, structurally, to be invisible.

Hallucination

The model generates information that has no basis in the input data and no statistical grounding in its training—a fabricated citation, a nonexistent drug, a procedure that was never performed. The output is wrong in a way that is often detectable because it conflicts with verifiable external sources or strikes the reader as implausible on its face.

Analogy: A witness who invents testimony from whole cloth. The fabrication is dramatic enough that careful cross-examination has a reasonable chance of exposing it.

Confabulation

The model fills a gap in the input data with information that is statistically plausible given the context—drawn from patterns in its training data rather than from the records themselves. The output is wrong, but it is wrong in a way that sounds right, fits the surrounding narrative, and does not conflict with anything the reader already knows to check.

Analogy: A witness who genuinely believes what they are saying because their brain reconstructed a memory from expectation rather than experience. The error is sincere, coherent, and nearly impossible to detect without independent verification of the underlying facts.

Confabulation is the failure mode that should concern plaintiff attorneys most—not because hallucination is harmless, but because confabulation is the failure that travels furthest before anyone catches it. A fabricated drug name looks wrong to a physician reviewing the output. A fabricated child looks right to everyone, because it is exactly the kind of detail that belongs in a damages overview and exactly the kind of detail that AI-assisted review is supposed to surface so the attorney does not have to go looking for it manually.

Why Medical Records Are Particularly Vulnerable

AI language models confabulate more frequently under specific conditions: when the input data is long and repetitive, when it is inconsistently formatted, when sections are missing or incomplete, and when the model is asked to synthesize rather than simply extract. Medical records check every one of those boxes simultaneously.

A complex hospital record is not a clean structured dataset. It is a collection of documents produced by dozens of different providers across days or weeks, in formats ranging from typed progress notes to handwritten orders to scanned flowsheets with variable OCR quality. The same clinical event may appear in five different places in five different forms—or it may not be documented at all, leaving a gap that the model will fill with something plausible if it is asked to produce a coherent summary of the case.

The model is not reading the record the way a physician reads it. It is predicting the next element of the narrative based on patterns in its training data. When the record contains enough context to anchor a prediction, the output is often accurate. When the record is ambiguous, incomplete, or simply silent on a point, the model does not flag the gap. It fills it—with the same confidence and formatting it uses when it is right. This is not a bug that will be resolved in the next software update. It is a fundamental property of how large language models generate text, and it applies regardless of how the platform is marketed or what it costs per case.

The model does not know what it does not know. It produces confident, coherent output regardless of whether the underlying records support it—because confidence is what the architecture produces.

The Risk Categories Beyond the Obvious One

The children example illustrates the biographical confabulation risk clearly, but that is one category among several—and not all of them are as immediately legible as a nonexistent dependent appearing in a damages section.

Confabulation and Hallucination Risk Categories in Medical Record Review
Biographical confabulation. The model infers personal details—dependents, occupation, living situation, relationships—from demographic patterns rather than documented facts. In a damages context, these details are not incidental. They are the foundation of consortium claims, loss of support arguments, and the human narrative a jury responds to.
Compressed or reordered timelines. The model synthesizes a clinical narrative from records that are not chronologically organized and may produce a sequence of events that is internally coherent but factually wrong—placing a lab result before the order that generated it, or collapsing hours of documented deterioration into a single moment. Timeline errors are the most dangerous category for causation arguments because they are difficult to detect and directly undermine the theory of the case.
Misattributed findings. The right clinical finding, attributed to the wrong provider or the wrong time. In a multi-defendant case, provider attribution is not a detail—it is the liability map. An AI that accurately identifies a missed finding but assigns it to the wrong physician has not helped the case. It has potentially redirected the entire liability analysis.
Resolved findings presented as active. A ruled-out diagnosis, a transient abnormality, or a finding that was addressed and documented as resolved—presented in the summary as an ongoing clinical problem. This affects both the causation analysis and the damages picture, and it is particularly difficult to catch without independent record review because the finding itself is real. Only its status is wrong.
Medication errors. Wrong dose, wrong timing, wrong route, or wrong drug entirely. Medication administration records are among the most difficult record types for AI to process accurately—they are dense, formatted inconsistently across facilities, and require clinical knowledge to interpret correctly. Errors in this category can invert the causation argument entirely, particularly in cases where timing is the mechanism of harm.
Fabricated clinical language. The model produces language that sounds like a clinical finding but reflects neither the records nor any real clinical phenomenon—a synthesis of plausible-sounding terminology that an attorney without clinical training cannot identify as wrong. This is hallucination in its clearest form, and it is most likely to occur when the model is asked to assess the standard of care rather than simply summarize documented events.

What the Confabulation Looked Like on the Page

To make this concrete—here is the type of divergence a confabulation produces, set against what the records actually contained.

Illustrative Confabulation—Biographical Detail
Wrongful death matter. Female decedent, early thirties, married, fatal outcome arising from an alleged failure of diagnosis during hospital admission. Records reviewed by an automated AI platform with no physician verification step.
AI Output

"The decedent was a 32-year-old married woman and mother of two young children, whose death has left her family without its primary caregiver..."

What the Records Showed

No reference to children anywhere in the record set. No obstetric history. No mention of dependents in any note, intake form, or social history. The decedent had no children.

The output was not presented as uncertain or inferred. It appeared in the damages summary the same way every other finding appeared—as a fact derived from the records. An attorney reading it had no reason to question it, because the detail is exactly what AI-assisted review is supposed to produce: a synthesized account of who the patient was and what her death cost. The error was indistinguishable from accurate output until someone who had read the records independently knew to look for the discrepancy.

The Fee Structure Problem

Several platforms currently marketing AI-powered medical record review to plaintiff firms charge per-case fees for automated output with no physician in the review loop. The fee is not the problem. The problem is what the fee does not include—a physician who has read the records, knows what they contain, and can tell the difference between what the AI produced and what the records actually support.

The sophistication of the marketing frequently exceeds the sophistication of the methodology. A clean interface, a well-formatted summary document, and clinical-sounding language are not verification—they are presentation. And presentation does not catch a fabricated dependent in a damages section, a collapsed timeline in a causation analysis, or a misattributed finding in a multi-defendant liability map.

The economic irony compounds the problem. An attorney pays for automated output that shapes their early case assessment, then potentially absorbs the downstream cost when the output turns out to be wrong. That cost takes the form of misdirected litigation strategy, damages arguments built on facts that do not exist, or the time and expense of reconstructing an analysis the records never supported. The platform has already been paid. The consequences land elsewhere.

A clean interface and clinical-sounding language are not verification. They are presentation. The two are not the same thing, and the difference is not always visible until something goes wrong.

What Verification Actually Requires

The answer to AI confabulation in medical record review is not avoiding AI. Used correctly, AI is a genuinely powerful tool—it processes volume faster than any human reviewer, surfaces patterns across thousands of pages, and finds things that linear human review misses. The answer is ensuring that AI operates inside a physician-directed workflow where independent verification is a structural requirement and not an optional add-on.

Verification in this context means something specific. It does not mean a physician reviewing the AI's output to assess whether it sounds right. It means a physician who has independently engaged with the underlying records—who knows what they actually contain—comparing the AI output against that independent knowledge and identifying every point of divergence. That is a fundamentally different standard from what most automated platforms offer, and it requires a fundamentally different kind of engagement with the source material.

In practice this means the AI surfaces findings and the physician confirms them against the source records. The AI constructs a timeline and the physician validates it against the documented sequence of events. The AI produces a damages summary and the physician verifies every biographical and clinical fact against the records that support it. Where the AI and the records diverge, the records control—without exception and without deference to how confidently the model stated the conflicting information.

The question every plaintiff attorney should ask before engaging any AI-assisted record review service is straightforward: who in this workflow has independently read the underlying records, and what happens when their reading disagrees with what the AI produced? If the honest answer is that nobody has independently read the records—that the AI output is itself the review—then nobody catches it, and the attorney finds out later at a cost the platform's invoice will not cover.

The Bottom Line

The woman in that case had no children. The AI described her as a mother of two because the model expected her to have children, and because the workflow around it contained no one whose job it was to know the difference. The error was caught—not by the platform, not by the output itself, but by a physician who had read the records independently and knew what they contained. That is not a story about AI failing in some dramatic or obvious way. It is a story about what AI requires in order to be used safely, and what happens in cases where those requirements are not met.

AI is a powerful tool in medical record review when it is directed by someone with the clinical knowledge to know what to look for, the methodology to verify what it finds, and the independence from the output to recognize when the model got something wrong. Without that physician in the loop, what you have is not an AI-assisted review. It is an unverified AI summary—formatted to look like a review, priced like a service, and wrong in ways that may not become apparent until the case is already built around the error.

The problem is not limited to medical record review. A researcher at HEC Paris maintains a public database of AI hallucination cases in court filings worldwide—as of early 2026, it documents more than 1,200 cases globally, with five to six new cases added every day. The sanctions are escalating with the numbers. In Oregon, a federal magistrate judge imposed $110,000 in fines and attorneys' fees—the costliest AI hallucination sanction in U.S. legal history—after two lawyers submitted 23 fabricated legal citations and the judge dismissed the case with prejudice. In Nebraska, a court ordered the first attorney license suspension tied to AI hallucinations. In April 2026, the Sixth Circuit sanctioned a Kentucky attorney after AI-generated appellate briefs contained fabricated citations, removing him from the case and referring the matter for disciplinary action. That same month, the California State Bar filed formal disciplinary charges against two attorneys and reached a disciplinary stipulation with a third, all stemming from court filings containing nonexistent AI-generated citations. The State Bar Chief Trial Counsel stated directly: "Whether relying on traditional research methods or emerging technologies like generative AI, attorneys remain fully responsible for verifying the accuracy of their work." The Sixth Circuit said the same: "new technologies are no substitute for tried-and-true safeguards managed by practicing attorneys." These are not edge cases. They are the predictable result of AI output used without independent verification—in legal briefs, and in medical record reviews.

Physician-directed review. Every case. No exceptions.

AI-assisted record review where a physician has independently engaged with the underlying records—not simply reviewed the AI's output. Every finding verified. Every biographical detail sourced to the records that support it. Every timeline validated against the documented sequence of events. If the records do not support it, it does not appear in the analysis.

Submit a Case →