The Epiq Angle

AI and eDiscovery: Beyond Predictive Coding

May 18, 2017

A.I. and eDiscovery: Beyond predictive coding

In eDiscovery, discussions about “artificial intelligence” generally focus on predictive coding—the machine learning process that reduces the time human reviewers must spend reading non-relevant information—by teaching the software to analyze periodic human feedback, learn for itself what information reviewers are actually interested in, and then locate that information.


Industry studies have shown that with the right training, predictive coding achieves better and more cost-effective results than traditional, Boolean logic-based eDiscovery, which requires humans to give detailed, specifically structured instruction sets for searches. Predictive coding has proven so effective that some courts now prefer algorithm-reviewed productions over human-reviewed productions, while others have refused to accept documents produced by manual review.

Big Data, Litigation and Artificial Intelligence

But this is only the beginning. Just as the original impetus for eDiscovery was the business-process transition from a primarily hard copy-based data system to electronically stored information (ESI), the current trend toward more intelligent AI is driven by the systemic transformation from primarily structured data retention to much larger, ever-growing, predominantly unstructured data (the “Big Data” phenomenon).

Much of an organization’s data universe is potentially discoverable in litigation, but its size and unstructured nature mean that it is now essentially impossible for humans to peruse any meaningful chunk of it without intelligent tools for culling out higher-value items that merit a human reviewer’s attention. Predictive coding is a good start, but it still requires too much human intervention to ensure the results are neither over-inclusive nor under-inclusive.

Big Data: A Boon for Machine Learning?

Fortunately, “big data” turns out to be a boon for machine learning. Unlike people, machines cannot extrapolate meaningful information or detect macro-level patterns from small datasets. Big data is providing real-world samples at the scale necessary to train AI effectively. Other aspects of computing also made recent strides toward AI. Previously, machine performance improved by gaining processing speed, but processing structure remained linear, and sequential.

Now, neural network architectures are enabling parallel computations, which work in synchronous and dynamic ways – more like how thinking works in biological brains. Natural Language Processing algorithms have improved greatly, so users can ask richer, more semantically complex questions instead of worrying about syntax for the computer’s sake. A famous example is Watson, IBM’s question-answering supercomputer and Jeopardy! champion. Watson’s cognitive computing system can adapt its learned skills to other contexts. It now delivers cloud-based concierge services, medical diagnostics, and even legal services. Last year, NextLaw Labs introduced the Watson-powered service ROSS, as “the world’s first artificially intelligent attorney.”

Artificially Intelligent Attorneys

Giving ultimate legal advice requires human judgment that machines don’t have (yet). But ROSS does such sophisticated work that we would normally assume to derive from “judgment” – like identifying legal issues from hearing facts in natural language and researching possible answers. Work is underway to build more complete multi-sensory cognitive media platforms, which would render audio and video content that can be searched for “objects, faces, license plate, logos, phrases, sentiment, voice identification, translation plus additional capabilities that are constantly evolving,” according to a report in AI Buisness.org.

Smarter eDiscovery

If cognitive computing continues to evolve, conceivably we could one day be living in a world where a lawyer could simply ask a natural language question like “did any company officer tell outsiders about the bankruptcy before the stock price fell?” and a supercomputer would return a wealth of audio, video, text, GPS, timekeeping and other data that might show interactions suggesting insider trading. 

But for now, human lawyers still have to make judgments about the information that computers retrieve and they still have to ask the right questions. In fact, when their machine is smart, self-correcting, and self-improving, the human team can focus on asking richer questions and developing the deeper story of the case.

The goal is not to get technology to do everything humans can do, but rather to limit the time and effort humans must invest in getting technology to do what humans can’t do very well, like scanning and sifting through terabytes of data. This frees up human professionals to do the deeper intellectual labor that is more valuable to their clients.

Filed under: AI, artificial intelligence, big data, ediscovery, ESI, predictive coding