Where This Started—Manual Review at Scale
Before AI-assisted review existed as a practical tool, working through a large medical record set meant developing efficiency strategies out of necessity. A 15,000-page hospital record cannot be read line by line in the time a plaintiff firm's economics allow. Something had to give—and what gave, over years of high-volume review, was the development of a clinical intuition about where in a record the important material tends to live and what language tends to accompany it when it appears.
Part of that intuition was structural—knowing that the nursing notes at 2:00 in the morning matter more than the attending's midday progress note, that the medication administration record tells a different story than the physician orders, that the rapid response documentation is almost never where you expect to find it in the production. That is the architecture of a medical record set, and knowing it cold is the foundation of efficient review.
But the other part was linguistic. Certain words appear in medical records when something has gone wrong and the author either knows it or is trying not to know it. Some of those words are clinically specific. Others are surprisingly ordinary—the kind of language a physician reaches for when they are documenting a situation that makes them uncomfortable, when they are hedging, when they are softening, or when they are quietly flagging something they cannot ignore but are not ready to address directly.
Learning to recognize those words—and more importantly, learning what they mean when they appear—is something that develops over thousands of case reviews. It is not something a search engine produces on its own, and it is not something an AI model generates without being told what to look for. It is clinical pattern recognition applied to language, and it is one of the most underappreciated skills in medicolegal consulting.
A physician who has reviewed enough records develops a feel for the language of evasion. Certain words appear when a provider may know something went wrong and is choosing language carefully around what they put in the record.
The Language of Evasion
The most revealing words in a medical record are often not clinical terms. They are ordinary English words that carry unusual weight in a documentation context—words that would not flag in a standard legal review but that an experienced physician reads as a signal.
"Unfortunately" is one of the most reliable of these. It appears in medical records when a provider is documenting an outcome they did not intend and are trying to frame as an unavoidable misfortune. "Unfortunately, the patient developed a wound infection" is very different clinically from simply documenting that a wound infection was identified. The word "unfortunately" suggests the author was aware of how the outcome might be perceived, not just what the outcome was. That is worth reading carefully.
"Interestingly" is another. When a physician writes "interestingly" in a progress note, they are almost always documenting something that surprised them—a finding that did not fit the clinical picture they had been working with. In a well-managed case, surprising findings get addressed. In a case that went wrong, "interestingly" sometimes marks the point where a finding was noticed, commented upon, and then not followed up on in the way it should have been. That single word can be the pivot point of a causation argument.
"Discussed with" and "family informed" are documentation phrases that warrant close attention to what immediately follows—and what does not follow. When a provider documents that a family was informed of a complication or a change in condition, the next logical question is what the record shows happening after that conversation. If the answer is nothing, or if the next note is from a different provider who documents a different clinical picture, the communication chain is worth reconstructing in detail.
"Per patient report" and "patient reports" are phrases that shift clinical responsibility to the patient. When a provider uses this language to document a symptom, they are creating a record that says the patient told me this—not that I observed it or confirmed it. In cases where a patient's reported symptoms were subsequently minimized or not acted upon, that language is exactly the kind of documentation that a defense attorney will use to argue the provider relied on the patient's own account. It is also the language that reveals when a provider was not independently assessing what they were being told.
Case-Specific Keywords—A Different Category Entirely
Beyond the general language of evasion, sophisticated record review requires a second category of keyword knowledge: the case-specific clinical terminology that is only meaningful if you already know what you are looking for and why it matters.
This is where clinical training becomes non-negotiable. An attorney reviewing a heparin-induced thrombocytopenia and thrombosis case—HITT—needs to know that the relevant trigger term is not "HITT" and may not even be "heparin-induced thrombocytopenia." The documentation that matters is often captured under the abbreviation "HIT" or "SRA"—the serotonin release assay, the confirmatory laboratory test for the diagnosis. A record that contains "SRA" and shows either that the test was ordered and not acted upon, or that it was never ordered in a clinical context where it should have been, is a record that tells the story of the case. An attorney who does not know to search for that abbreviation may read through the same record and find nothing.
Similarly, in an obstetric brachial plexus case arising from shoulder dystocia, the word "maneuver" is one of the most important terms in the delivery record. Shoulder dystocia is managed through a defined sequence of obstetric maneuvers—McRoberts, suprapubic pressure, Woods screw, Zavanelli—and the documentation of which maneuvers were attempted, in what sequence, and over what time interval is the clinical foundation of the standard-of-care analysis. A delivery note that documents "shoulder dystocia, maneuvers performed, delivery achieved" is a very different record from one that documents the specific sequence. The presence or absence of the word "maneuver" in conjunction with specific named techniques tells you immediately whether the documentation is adequate—and whether the absence of detail is itself a problem worth pursuing.
- The relevant term is often an abbreviation, acronym, or informal clinical shorthand that does not appear in any legal glossary—and that means nothing without knowing the clinical context it represents
- Finding the term is only step one—knowing whether its presence or absence is the problem requires understanding the standard of care well enough to recognize a deviation when you see one
- The same term can be favorable or unfavorable depending on what surrounds it—"SRA ordered" is different from "SRA ordered, resulted positive, no action documented"
- Case-specific keyword lists have to be built fresh for each case type—what matters in a HITT case is irrelevant in a sepsis case, and what matters in a shoulder dystocia case is irrelevant in an anesthesia case
What AI Changed—and What It Did Not
The honest account of how AI fits into this methodology is not that it replaced clinical keyword knowledge. It is that it removed the bottleneck that made applying that knowledge to large record sets so time-consuming.
In a manual review workflow, running a keyword search through 20,000 pages of medical records meant either using PDF search functions—which are unreliable across scanned records, inconsistent with OCR quality, and blind to clinical context—or reading closely enough to catch the relevant language as it appeared. Both approaches are slow, and both are prone to missing things, particularly in record sets where the relevant documentation is scattered across dozens of separate files in no consistent order.
AI-assisted review changes that calculus entirely. A well-constructed prompt can surface every instance of a target term across an entire record set in seconds, return the surrounding context for each instance, flag the clinical significance of each hit, and rank the results by likely relevance—all before a physician has read a single page. What used to take hours of targeted searching now takes minutes. What used to require a second pass to verify context now happens in the initial pass.
But—and this is the part that matters—the prompt has to be built by someone who knows what to ask for. A language model does not know that "SRA" in the context of a heparin administration record is a different finding than "SRA" in a different context. It does not know that "unfortunately" in a physician progress note warrants closer reading than "unfortunately" in a nursing note documenting a family's emotional response to a diagnosis. It does not know which maneuvers, in which sequence, represent the standard of care for shoulder dystocia. All of that knowledge has to come from the physician directing the review—and it has to be translated into prompts specific enough that the model surfaces what matters rather than producing a high-volume output that requires just as much sorting as the original record set.
PDF search across 20,000 pages with unreliable OCR and no clinical context.
Hours spent locating target terms, then more hours reading surrounding documentation to assess significance.
High risk of missing terms buried in scanned records, handwritten notes, or inconsistently formatted flowsheets.
Keyword list built from clinical experience—applied slowly, one document at a time.
Targeted prompts surface every instance of every relevant term across the full record set in minutes.
Surrounding context returned with each hit—clinical significance assessed in the same pass.
Scanned records, flowsheets, and non-linear documentation all processed in the same workflow.
Keyword list still built from clinical experience—applied at a scale and speed that manual review cannot match.
The Prompt Is the Product
The practical implication of this methodology is that the quality of an AI-assisted record review is almost entirely determined by the quality of the prompts driving it. A generic prompt—"summarize this medical record" or "identify any issues with the standard of care"—produces generic output. It will not find "SRA" in a heparin case. It will not flag "interestingly" in a progress note from the night before a patient deteriorated. It will not reconstruct the maneuver sequence in a shoulder dystocia delivery. It will produce a plausible-sounding summary that tells you approximately what happened and misses almost everything that matters for litigation purposes.
A well-constructed prompt does something different. It tells the model what the case is about, what the mechanism of alleged harm is, which providers and time windows are most relevant, which specific terms to search for and why, what clinical context surrounds those terms, and what the model should flag versus what it should treat as background. It is, in effect, a clinical briefing—the same briefing a senior physician would give a junior colleague before asking them to review a record set. The quality of what comes back reflects the quality of what went in.
This is why physician-directed AI review and raw AI review are not the same product. The technology is the same. The knowledge driving it is not.
A well-constructed prompt is a clinical briefing. What comes back from the model reflects what went in—and what went in reflects whether the person writing the prompt knows what they are looking for.
What This Means for Your Cases
For complex medical malpractice cases, the practical implication is this: the record review methodology matters as much as the record review itself. A reviewer who reads linearly and summarizes what they find will produce a different work product than a reviewer who approaches the same record set with a structured keyword framework, case-specific clinical terminology, and the prompt discipline to extract what the model surfaces into a usable analysis.
The difference shows up most clearly in large, complex record sets. The key finding is buried in a nursing flowsheet at page 14,847, encoded in a laboratory abbreviation that appears twice in 20,000 pages, or captured in a single word a physician chose because it was the closest they could come to documenting what they knew without saying it directly.
Those are the cases where methodology is the margin. And methodology is not something an AI model brings to the review. It is something the physician directing the review has to bring—built from years of knowing what to look for, where to look, and what it means when you find it. That same keyword framework applies across every production the case generates, not just the initial review. As new records arrive, the same clinical lens is applied to what is new—and to what has changed.
Physician-directed AI review on your most complex record sets.
Physician-directed AI review with case-specific keyword methodology and clinical timeline construction. The approach described in this article is the one applied to every record set we review—not a marketing framework, but an actual working methodology built over thousands of cases.
Submit a Case →