Trial Magazine
Theme Article
Streamline Discovery Review With AI
Document review is a time-consuming and labor-intensive undertaking. AI tools can help.
June 2024Modern document discovery almost always involves the exchange of electronically stored information (ESI), and it is not uncommon for a defendant’s production to include millions of pages of documents—including emails, scanned documents, and native documents such as Microsoft Office documents—in connection with complex litigation.
Identifying key documents within these large collections of data presents a significant challenge to many plaintiff law firms. Yet, reviewing and categorizing these documents are critical steps to prepare for fact depositions and expert review, cross-examine defense experts, respond to dispositive motions, and prepare for trial.
But there are often challenges to efficient and effective review. First, reviewing large productions requires a significant staff investment. Document review is typically conducted by attorneys, and few plaintiff law firms are sufficiently equipped to allocate attorney staff to conduct comprehensive reviews of large document productions alongside other case requirements. Under some circumstances, firms hire contract reviewers, at a high cost, to assist in managing and conducting review, either directly or through agencies.
Even when staff is available to conduct these reviews, identifying key, relevant documents can still be a challenge if reviewers do not possess a comprehensive understanding of the relevant facts and legal theories involved in the case. This is particularly true in highly technical cases, when critical documents may include poorly labeled spreadsheets or other technical data that may not be easily understood by someone who has not, for example, had the experience of taking fact depositions.
And reviewers may err on the side of caution, resulting in thousands of documents being unnecessarily tagged as “hot” or “relevant” and thus requiring further review. Well-designed coding panels can help to further organize documents, but excessively complex panels can introduce the potential for misclassification.1
Each of these challenges frustrates efforts to get the important documents into the hands of the people who need them, including those who are taking depositions or working with experts to prepare reports. In some instances, systematic document review may be a matter of “checking the box” in a way that fails to yield meaningful insights or organize discovery material in a manner that leverages this information for case preparation. In those circumstances, deposition preparation and compiling expert reports might be predominantly performed through keyword searches for deponent names and subject areas—a failure of the systematic review process.
Artificial intelligence tools available today, and others on the immediate horizon, can aid plaintiff firms in addressing these challenges. AI techniques that can be readily employed to assist in large document review projects include:
- predictive coding to identify relevant documents and potentially categorize them by subject area
- advanced searching, which goes beyond the parameters of error-prone and restrictive Boolean searches to include spelling variations and conceptually related terms specific to the dataset
- data visualization to generate graphics to identify patterns or trends that may not be readily observable from individual assessment of documents
- analytical tools to help synthesize or summarize individual documents or collections of documents.
Here are some ways to use those tools to make the e-discovery process more efficient—and a look at the role of generative AI.
Certain AI-related tools have been available for years to assist in document review, under the nomenclature of “technology-assisted review.”
Predictive Coding & TAR
Certain AI-related tools have been available for years to assist in document review, under the nomenclature of “technology-assisted review” (TAR).2
Conventional TAR systems that use an element of AI rely on attorneys to train models by categorizing or tagging relevant documents within a randomly selected subset of a collection or production of ESI, often called a “training set.” These tools then use an algorithm to identify documents from the remaining production that the model predicts would likely be categorized similarly, based on the pattern of similar terms within the document.3
While often used in the context of a defense review for relevance, responsiveness, or privilege, TAR techniques also are used by plaintiffs to conduct reviews of large document productions. These models can be used either to assess relevance (tagging the document as “hot” or “cold,” for example) or for subject-matter categorizations.
In an air pollution case, for example, a reviewer may use tags on a training set for “permit compliance” and “community impacts,” which can then be used to identify other unreviewed documents in the collection that are likely to fall into those categories.
The utility of an AI review or TAR in categorizing documents may be judged by recall/sensitivity (if a document is “relevant,” for instance, does the TAR tool identify it as such) and precision/specificity (does the system generate many false-positives or flag documents that are not relevant). In general, the greater the number of documents reviewed and tagged within the training set, the more accurate the predictive coding.
However, even in a well-trained model, traditional TAR relies on the occurrence of similar language within a document, so without further context, this approach may result in lower recall and precision than desired. If, for example, the training set did not include documents using a key term or phrase (such as “acoustic software,” a seemingly incongruous term used internally by Volkswagen to describe its software that evaded emissions standards testing), the TAR system would not know to consider that term in assessing relevance, potentially resulting in lower recall.
More recent developments in AI— and natural language processing (NLP) in particular—allow certain types of categorizations without training by an attorney. These tools offer the potential to identify, categorize, and cluster relevant documents that may, for example, use alternate terms and phrases that the AI model will recognize as related.
For instance, asking a system to identify documents related to “air permitting” will call on the model’s ability to identify terms and concepts associated with air permitting, which may include the related terms “Title V” or “PSD,” to pinpoint responsive content in the repository of documents. These types of AI tools are steadily being implemented in leading e-discovery platforms.4
Search Term Techniques
The use of targeted search terms, alongside systematic review of a production, is a time-tested approach to culling relevant information from large collections of data. Yet, conventional search term methodologies are limited in several ways:
- Spelling errors in the underlying documents can influence results.
- The unique use of industry- or company-specific terminology may prevent the discovery of important documents.
- Documents with optical character recognition (OCR) may introduce errors in the text, which would not return from a search.
Conventional techniques available in most existing review platforms include search methodologies that account for “fuzziness”—minor variations in spelling or spacing. As with predictive coding, however, some existing platforms (as well as others currently in development) use NLP to permit more inclusive searches. These more inclusive searches will not only identify misspellings of the search terms but also return documents that use conceptually related or similar terms.5
Even when the tech empowers users with more robust search capabilities, formulating precise and contextually relevant queries is critical.
The effectiveness of these evolving tools hinges on skillful input by legal professionals. Even when the technology empowers users with more robust search capabilities, formulating precise and contextually relevant queries is critical. Attorneys must guide the AI algorithms by providing well-crafted queries and refining search parameters based on their nuanced understanding of the case and legal context. Even with the use of concept searching and related AI tools, document searching is an iterative process—review of additional discovery material will reveal additional key witnesses, facts, and industry- or defendant-specific terminology.
Data Visualization
Existing tools—including Brainspace, Everlaw, Relativity, and others—can generate data visualizations that may provide unique insights into broad collections of data. While preparing for the deposition of a corporate officer, for example, you might want to identify the employees with whom that officer communicates most frequently or determine with whom the deponent communicates most frequently about a certain topic. These relationships can be depicted in the form of a network graph, with nodes (individuals) connected by lines of a weight proportionate to the volume of those communications.
Alternatively, you may want to see a graph showing how frequently a certain term or phrase is used in email correspondence over time to uncover a new perspective on how the company’s priorities may have changed.
Generative AI
Evolving AI platforms that use generative AI—such as ChatGPT, Claude, or Microsoft Azure—and the related concept of retrieval-augmented generation (learn more on p. 34) will likely bring additional powerful tools to summarize and organize documents obtained in discovery.
For instance, generative AI functions include the ability to provide plain-language instructions to the program, such as “identify communications between employees that discuss concerns about the safety of their product,” or “provide a chronology of events related to emissions of a toxic substance.”
Other tools tailor-made for medical records permit automatic indexing of those records by medical specialty, as well as the automated development of time lines for medical diagnosis and treatment. These tools can provide a narrative response, including citations to relevant documents, and will have an obvious impact on allowing counsel to develop a rapid understanding of what is contained within a document production—with later supplementation by systematic review.
Proceed With Caution
Like any tool, AI is not a substitute for human review and analysis. For example, generative AI models, including ChatGPT and other large language models, may draw relationships or “hallucinate” facts that are not supported by the record. AI outputs must be contextualized through an attorney’s understanding of the data and the facts of the litigation. Additionally, as with any legal technology, consider whether the use of AI presents data privacy concerns, particularly as some tools “learn” from analyzed data in ways that require further scrutiny.
Explore new AI tools before you need them—it is a first step to understanding their usefulness while minimizing risk. Many tools can be integrated with traditional review techniques, allowing you to leverage AI as a quality control or additional screen for otherwise overlooked information.
The use of AI is expected to become a standard component of discovery in complex matters. It is critical to become familiar with the strengths and weaknesses of these tools, so that they are appropriately used to augment, rather than replace, attorneys’ diligent efforts.
Brent Ceryes is a partner at Baird, Mandalas, Brockstedt & Federico in Baltimore and can be reached at bceryes@bmbfclaw.com. The views expressed in this article are the author’s and do not constitute an endorsement of any product or service by Trial or AAJ.
Notes
- A coding panel is used within document review and e-discovery software to assign metadata to individual documents—including tags, codes, and attorney comments—to organize material within a production.
- This AI technology has at least three names: TAR, predictive coding, and computer-assisted review. See George Socha, What is Technology Assisted Review?, Reveal, Mar. 5, 2024, https://resource.revealdata.com/en/blog/technology-assisted-review.
- See Kelly Lavelle, Technology-Assisted Review: A Superior Approach in Legal Document Review, The Legal Intelligencer, Oct. 26, 2023, https://tinyurl.com/28dasw6e.
- Brainspace, Disco, Relativity, and Reveal are among the platforms that have incorporated these or similar clustering tools.
- Reveal Review and Relativity are among platforms that offer AI-assisted “concept searching.”