Consilio | Blog | eDiscovery and ESI Processing: 4 Crucial Steps from Raw Data to Insights

eDiscovery and ESI Processing: 4 Crucial Steps from Raw Data to Insights

Published: Oct 06, 2025

Updated:

Raw data in the eDiscovery space is just the beginning of a much larger story of ESI processing in eDiscovery. Like a miner sifting through layers of rock to find precious gems, legal teams must process mountains of electronically stored information (ESI) to extract the key pieces that will shape their case.

Preparing data for eDiscovery goes beyond simple data management. It ensures that evidence is accurate, complete, and usable throughout the discovery process. The true value of data may only become apparent once you unpack and refine it. Files may be missing context, metadata might be mishandled, and search capabilities could become unreliable. That’s where processing comes in.

Without proper processing, raw data remains a tangled web of emails, documents, and metadata, offering no clear path to insights or actionable intelligence. Emails embedded within documents, attachments hidden in archives, and metadata detailing the who, what, and when of each file all need to be unraveled. Otherwise, searches may be unreliable, documents may become unusable, production options may be limited, and critical evidence could be overlooked or misinterpreted.

In this article, we’ll walk through the four essential steps—expansion, extraction and normalization, indexing, and culling—that transform raw ESI into organized, actionable insights. Each step plays a crucial role in ensuring that the data you’re working with is not only comprehensive but also ready for fast, accurate review.

Step 1: Expansion

Container files function as digital archives. They hold collections of smaller files, such as emails, documents, or images, and are packaged in formats like PST or ZIP.

The first major step in processing is expansion: unpacking the various container files that house your ESI. Think of container files like digital suitcases. They hold a collection of smaller files that can include emails, documents, or images, all zipped up together in formats like PST and ZIP files. In legal matters, it’s essential to unpack these container files to see what’s inside and determine which items are relevant to the case.

Beyond typical container files, many documents have embedded objects, such as email attachments or images embedded within documents. These need to be extracted and sometimes treated as individual files. For instance, an attached spreadsheet within an email could hold crucial financial data relevant to your case. This extraction ensures that every piece of content is captured and prepared for later review.

Effective expansion ensures that nothing is missed and establishes the foundation for the remaining processing activities. Without a thorough unpacking of all container files and embedded objects, key pieces of evidence could slip through the cracks.

Tip: Managing metadata

Metadata is the unsung hero of ESI processing.

It provides crucial information about files, such as who created them, when they were last modified, and how they relate to other documents. This hidden layer of data can offer valuable insights, especially in cases involving long email chains or message threads.

When processing ESI, it’s important to capture and preserve metadata accurately. Mishandling metadata can lead to problems later in the discovery process, making it difficult to track the provenance of documents or verify their authenticity.

Step 2: Extraction and normalization

After you have unpacked the files, the next step is extraction and normalization. At this stage, the system pulls all the text and metadata from the files and converts it into a standardized format, making it easier to work with later. ESI comes in hundreds of file types, each requiring specialized software to open and read. Without normalization, you’d need a dozen programs just to view the collected data.

For example, the extraction process pulls out the body text from emails, Word documents, and PDFs while capturing metadata such as the author, date, and file type. If the document contains scanned images or non-searchable text, optical character recognition (OCR) may extract the text.

Normalization converts this extracted content into a consistent, readable format across all files, allowing for seamless searching and review. Once normalized, every document—whether it’s an email, a Word file, or a spreadsheet—can be viewed and interacted with in the same way, regardless of its original format.

Tip: Unitizing long message threads

Conversations from text messages, Slack channels, and collaboration tools can often span extended periods, making them difficult to review efficiently.

Unitization solves this problem by breaking long threads into smaller, more manageable chunks. For example, a thread could be divided into 24-hour segments, allowing legal teams to focus on a day’s worth of messages at a time. This approach preserves the context of the conversation without requiring review of an endless stream of text.

Step 3: Indexing

After expansion and normalization, the next step is indexing. Indexing creates a structured reference that enables fast, efficient searching across large volumes of data. The system creates tables of all the words and terms found in the processed data, making it possible to run keyword searches and find specific pieces of information quickly.

The most common form of indexing is an inverted index, which works by indexing every word in the data and linking it to the document in which it appears. This allows users to search for keywords and immediately locate relevant documents. Indexing also includes more advanced techniques, such as semantic indexing, which looks at the relationships between terms and helps cluster related documents together. Semantic indexing can be particularly useful for identifying patterns or themes within large datasets.

Without proper indexing, searches can be slow, incomplete, or inaccurate, which can derail the entire review process. Having a well-structured index ensures that all the data is readily accessible and easily searchable, helping legal teams locate critical information faster and with greater accuracy.

Step 4: Culling

The final core activity in ESI processing is culling: a process that reduces irrelevant or redundant data, ensuring that only the most relevant material proceeds to review. It works by trimming down the dataset to remove irrelevant or redundant information, saving time and reducing costs in the later stages of discovery. Culling also helps separate the signal from the noise, ensuring that only the most relevant data is left for further review.

Several key culling techniques are commonly used in eDiscovery:

De-NISTing: This step removes system files that are irrelevant to the case, such as executables and software components. These files are essential for operating systems and applications but are irrelevant to legal matters. De-NISTing eliminates these files from the dataset, ensuring only user-generated data is included for review.
Deduplication: During data collection, it’s common to encounter multiple copies of the same document or email. Deduplication identifies and removes these duplicates, reducing the overall volume of data that needs to be reviewed. For instance, if an email with an attachment appears in multiple inboxes, deduplication ensures that only one copy is retained while still noting where duplicates existed.
Content filtering: Culling can also include filtering based on specific criteria, such as date ranges or keywords. For example, if a case only involves emails exchanged within the past two years, filtering out emails outside that date range can reduce the amount of irrelevant data in the dataset. Similarly, keyword filtering can narrow the focus to documents containing specific case-related terms.

Culling allows legal teams to focus on the most relevant data, cutting down on unnecessary review time and minimizing costs associated with processing and storage. However, it’s important to carefully manage culling techniques to ensure that no critical information is inadvertently removed.

Ensure your ESI is processed into reliable, actionable insights

Correctly processing ESI transforms collected data from a raw, unstructured state into something usable and actionable for litigation. By effectively expanding, extracting, normalizing, indexing, and culling the data, legal professionals can streamline workflows, reduce costs, and uncover key insights that might otherwise remain hidden.

For legal teams looking to refine their eDiscovery processes and leverage every piece of information at their disposal, understanding and applying processing fundamentals is key. For more in-depth insights on how to optimize your eDiscovery processes, download our practice guide on ESI processing, “Time to Make the Donuts: Processing Fundamentals.”

Get the practice guide now

Consilioの最新情報にサインアップ

ロレム・イプサム・ドロール・シット・メット、コネクター・ディピッシング・エリット。様々なものを悲惨な要素にぶつけます。

ありがとう!提出物が受理されました！

「サインアップ」をクリックすると、当社に同意したものとみなされますプライバシーポリシー

おっと!フォームの送信中に問題が発生しました。