The 4,500-Page AI Report on 1,424 Pages of Records

What the Platforms Promise

The marketing language across AI medical record review platforms follows a recognizable arc. The problem is framed as volume—thousands of pages, buried paralegals, attorneys spending 40 hours on records when they should be in depositions. The solution is speed and replacement. Not augmentation of clinical judgment, but elimination of the need for it. No human reviewers involved. An AI paralegal that never sleeps, never negotiates salary, and processes records in minutes rather than days.

These are not fringe claims from unknown startups. They are the current marketing language of platforms that plaintiff firms across the country are actively subscribing to and paying per-case fees to use. The implicit promise is that uploading records to the platform is equivalent to handing them to an experienced reviewer—that what comes back is a reliable, usable analysis the attorney can act on.

The outputs reviewed in the course of this work suggest that promise is not being kept.

No human reviewers involved. An AI paralegal that never sleeps, never negotiates salary, and processes records in minutes. The outputs reviewed here suggest the promise is not being kept.

How a 4,500-Page Report Gets Built from 1,424 Pages of Records

Large language models, when asked to summarize a complex medical record set without careful prompt discipline, do not default to concision. They default to comprehensiveness—because comprehensiveness is what the architecture rewards. A model that captures everything cannot be accused of missing anything. A model that summarizes aggressively will inevitably leave something out and, in a liability context, that omission becomes a problem.

The result is a document that attempts to capture every note, every vital sign entry, every medication administration record, every provider interaction—organized not by clinical significance or chronological relevance, but by the model's internal processing logic. The same clinical event, documented in five different places in the original records, may appear in five different sections of the output—in the nursing notes summary, in the medication administration summary, in the provider timeline, in the causation analysis, and again in the conclusions. Nothing is synthesized. Everything is restated.

4,500+

Pages of AI output generated from 1,424 pages of source records by one commercial platform. Not because the AI found more information than existed in the file—but because it processed the same information repeatedly across overlapping frameworks, with no physician to tell it what mattered and what did not. The attorney who paid for efficiency received a document that required as much time to navigate as the original records.

The page count is not the point. The point is what the page count represents—a model that had no clinical framework for deciding what was significant, no judgment about what the attorney actually needed to know, and no ability to distinguish a finding that changes the case from a nursing note entry that is identical to the five hundred that preceded it. Without that judgment, the model produces everything, because everything is equally weighted in the absence of clinical knowledge to rank it.

Two Jobs Instead of One

The attorney who receives a bulk AI output longer than the records it summarizes does not save time. They inherit a new problem layered on top of the original one. The original problem was a large, disorganized record set that required clinical expertise to navigate. The new problem is a large, disorganized AI output that requires the same clinical expertise to evaluate—plus the additional task of reconciling the output against the source records wherever something in the summary looks uncertain, inconsistent, or simply too convenient.

This is the failure mode that does not appear in platform marketing materials. The pitch is always about what the attorney no longer has to do. The reality, for many attorneys who have used these platforms on complex cases, is a document they cannot efficiently use sitting alongside the original records they still have not read—and a deadline that has not moved.

There is also the more immediate problem of sustained attention. A well-organized, clinically prioritized record summary of forty pages can be read carefully and absorbed. A repetitive, bulk output of this scale—written in clinical-sounding language that carries equal weight across findings that are significant and findings that are not—cannot. The attorney reads the same type of information restated in slightly different language across dozens of sections, the document begins to feel indistinguishable from the records themselves, and the focused analytical attention that case evaluation requires becomes increasingly difficult to sustain. The output does not support the decision the attorney needs to make. It buries it.

A document that requires the same interpretive effort as the records it was meant to replace has not solved the problem. It has repackaged it.

Three Real Outputs from Commercial Platforms

In reviewing automated AI record review platforms as part of ongoing research into AI-assisted medicolegal methodology, I evaluated outputs from two commercial services that plaintiff firms are currently using and paying for. What follows is not hypothetical. It is a description of what those platforms actually produced.

Platform A—93 pages on two record files. The output was structured as a spreadsheet with 15 columns per row, each column representing a different question the model had been asked about the same clinical encounters. The same Pseudomonas culture result appeared in the "what was done" column, the "abnormal findings" column, the "nursing notes" column, the "medications" column, and the "records gaps" column—not synthesized across those categories, but restated separately in each one. The Summary column and the Summary Focus column were identical—the same text copied verbatim into two adjacent cells on every single row. The deposition targets section listed every provider whose name appeared anywhere in the records, including radiologists, pharmacists, lab personnel, and nursing staff with peripheral roles, each with a block of boilerplate questions attached. An attorney reading that output would spend more time sorting what mattered from what did not than they would have spent reading the records themselves.

Platform B—more than 4,500 pages on 1,424 pages of records. The platform produced three separate PDFs of 1,537, 1,537, and 1,449 pages from a record set spanning five years—2019 to 2024—and 675 distinct document events. The first two PDFs—labeled "chronology raw" and "chronology final"—had identical page counts. The output had been reformatted, not edited. A liability analysis was applied to every document uploaded—clinical content or not. A standing orders form from 2019 received the same "Failure to Recognize a Complication" treatment as an operative note, producing hedging language like: "The document does not provide specific details about whether complications were recognized or addressed in individual cases." That sentence is not an analysis. It is a model generating a plausible-sounding hedge because it was prompted to find liability content in a document that contains none.

Platform A—session memory failure. After uploading records and submitting a detailed four-step review prompt to Platform A, the platform confirmed that all tasks had been completed. When asked for a downloadable summary—the final step in the prompt—the platform responded by asking the user to upload the documents and re-explain the instructions. It had no memory of the session. Asked again, it repeated the request. The output those 93 pages represented had been generated in a context window the platform no longer had access to. There was no persistent case file, no institutional memory, and no one on the other end reviewing what the model had produced. The attorney asking for the work product had to start over.

The platform confirmed all tasks were complete. Then it asked the attorney to re-upload the records and re-explain the instructions. It had no memory of what it had just done.

These are not edge cases or unusual failures. They are the predictable outputs of general-purpose AI models applied to medical records without physician direction, without independent verification, and without the clinical judgment to distinguish a finding that changes a case from a sentence that fills a column.

What the Fee Actually Bought

The per-case fees charged by automated review platforms are not trivial. They are priced to reflect the value of a service that, if executed correctly, would genuinely save significant attorney time and produce a clinically reliable foundation for case evaluation. That is what the fee implies. The question is whether the output delivers on that implication—and in the cases described here, it does not.

An attorney who pays for an automated review and receives a document that is longer than the records, more repetitive than the notes, and requires a return visit to the source records to make sense of has not purchased efficiency. They have purchased a first draft of a problem they still have to solve, at a price that reflects a finished product. The platform has been paid. The work has not been done.

The fee is not the problem—a physician-directed review that produces a usable, clinically organized analysis is worth the investment regardless of the price. The problem is paying that price for output that creates more work than it eliminates and still requires a physician to sort through it before it is useful. At that point the attorney has paid twice: once to the platform, and once to whoever they eventually hire to make sense of what the platform produced.

Paying for an automated review and then paying a physician to interpret the output is not a workflow. It is the long way around to what physician-directed review would have produced in the first place.

What a Usable Output Actually Looks Like

The contrast is not subtle. A physician-directed review of a 1,424-page record set does not produce 4,500 pages. It produces a document averaging 15 to 20 pages—whose length reflects the clinical complexity of the case, not the volume of the records or the verbosity of the model generating the summary.

What is in those 15 to 20 pages is the difference. Not a restatement of every clinical event in the file. Not a liability analysis applied to standing orders forms. Not a deposition target list that includes the pharmacist, the billing coordinator, and every radiologist whose name appeared on any report. What is in those pages is what the attorney actually needs before deciding whether to take a case and how to build it:

What happened. A clinical narrative of the relevant events in chronological order, written in plain language that a non-physician can follow and a physician would recognize as accurate.

What went wrong. The specific clinical decisions, failures to act, missed findings, and departures from standard practice that the records support—not legal conclusions, but the factual clinical picture that expert testimony will later be built around.

Who was at fault and who the defendants need to be. A clear-eyed assessment of which providers and entities the records implicate, with enough specificity that the attorney can make informed decisions about who to name and why.

Who needs to be deposed. Not every provider in the file. A case may need one targeted deposition or ten—but the average plaintiff medical malpractice case can be built efficiently around three to five critical witnesses whose records drive the liability and causation narrative. The report identifies who those witnesses are, what their documentation shows, and what needs to be addressed under oath.

What records are missing. A gap analysis identifying what should be in the file and is not—the records that need to be specifically requested before any expert is retained and before the standard-of-care analysis is finalized.

What the defense is likely to argue. An honest assessment of the strongest arguments on the other side—the clinical facts that support the defense theory, the causation problems the plaintiff will have to overcome, and where the case is genuinely complicated rather than straightforward.

What experts will be needed. A preliminary assessment of which specialties the case requires for standard-of-care and causation opinions, with enough clinical specificity that the attorney knows what kind of expert to look for—not a generic "medical expert" recommendation, but a judgment about whether this case needs a hospitalist, a pulmonologist, a vascular surgeon, or all three.

What comorbidities and client factors affect causation. An honest accounting of the pre-existing conditions, baseline health status, and clinical history that the defense may use to complicate the causation picture—not to undermine the case, but to map the terrain accurately before anyone retains an expert or files a complaint.

What the but-for damages actually are. A clear separation between the injuries and conditions that are causally attributable to the negligence and those that existed before or developed independent of it. The damages that would not exist but for the negligent events—in plain clinical language, not legal framing—and the conditions that were already present or would have occurred regardless. That distinction is what drives case value, and it is one that requires clinical judgment to draw accurately.

That is what a physician-directed review produces. Not a document the attorney has to navigate before they can use it. A document they can act on.

Courts are beginning to ask the same question about AI tools used in clinical practice. As one physician-attorney columnist recently put it in Physicians Weekly, AI mimics human analytic patterns and creates a convincing appearance of intelligence—but it is not human, and the legal implications of that reality for anyone who relies on AI output without independent verification are evolving quickly.

The consequences of unverified AI output are no longer theoretical. In 2025, attorneys from two law firms—including K&L Gates LLP—submitted a brief to a federal special master containing numerous hallucinated citations generated by AI tools including CoCounsel, Westlaw Precision, and Google Gemini. The attorneys were ordered to jointly pay $31,100 in the opposing party's legal fees. The special master noted that he had been "persuaded by the authorities they cited, and looked up the decisions to learn more about them—only to find that they didn't exist. That's scary." The same platforms marketed to plaintiff attorneys for medical record review are producing the same class of output problem in court filings. The common thread is the same: AI that generates confident-sounding content, formatted to look authoritative, that no one in the workflow verified.

The Bottom Line

Automated AI record review platforms are selling a version of efficiency that the output frequently does not deliver. A 93-page spreadsheet with duplicate columns on two record files. More than 4,500 pages of output on 1,424 pages of source records—three of those output files totaling over 4,500 pages, two of them identical reformats of each other. A platform that confirmed completing a four-step review and then, one exchange later, asked the attorney to re-upload the records and start over.

These are not isolated failures from obscure products. They are outputs from commercial platforms that plaintiff firms are actively using and paying for—platforms that market themselves as time-saving solutions for exactly the kind of record-heavy early case work that a physician-directed review addresses.

The problem is not always accuracy—though accuracy is a serious and separate concern addressed in the companion piece to this article. The problem is utility. A document that is longer than the records it summarizes, that restates the same findings across a dozen overlapping columns, that applies liability analysis to a standing orders form, and that loses its memory of the case between the review and the follow-up question is not a useful work product. It is a bulk output dressed up as analysis, and the attorney who receives it is not better positioned to evaluate the case than they were before it arrived.

The value of AI in medical record review is real, but it depends entirely on the clinical judgment directing it—on a physician who knows what matters, what to look for, and how to translate what the records show into a document that an attorney can actually use. Without that physician in the loop, what the platform produces is not a record review. It is a record restatement—longer, more expensive, and no more useful than the original.

A review you can actually use.

Physician-directed AI record review that produces a clinically organized analysis—not a bulk output. Length proportionate to case complexity. Findings ranked by significance. Timeline built chronologically around the events that matter. Delivered within 48–72 hours of completed record receipt.

Submit a Case →