Four Technologies Help Limit and Manage PII in Cross-Border eDiscovery

With the General Data Protection Regulation (GDPR) yet to go into effect and evolving data privacy laws, organizations face a huge challenge when it comes to cross-border transfers of personally identifiable information (PII).

Adding to this challenge is the dichotomy between the U.S. and EU in how those regions approach data security, with the EU introducing strict requirements about how private data should be accessed and from where.

Within this complex regulatory landscape, it makes sense to both limit and manage PII in eDiscovery. Four emerging technologies can help achieve these goals.

Duplicate Detection

Today emails are one of the largest sources of data during eDiscovery. But as anyone who uses email knows, not every email is material. Basic search tools can significantly cut down on the volume of emails by deleting exact duplicates (since every email has both a sender and at least one recipient) and near duplicates, like marketing emails sent out to a database of thousands of contacts.

Predictive Coding

Predictive coding is a form of machine learning that can imitate how lawyers code documents and then apply those guidelines across entire datasets. So instead of having junior attorneys comb through thousands of inconsequential documents, a law firm can instead run an algorithm that scans and automatically classifies each document according to the probability that it is relevant, privileged or important.

This type of technology can be a major time saver. Research has shown that, on average, when parties run predictive coding after keyword filtering, 60 to 70 percent more non-responsive documents are culled from the population. The only caveat is that most predictive coding software only works with English-language documents, so lawyers may have to take extra steps to accommodate multilingual datasets.


To avoid violating data protection laws, attorneys must often redact PII before transferring documents out of the country. While this process is typically done manually, there has been a recent movement toward a more technological approach.

In particular, there are now review platforms and tools that can search for certain programmable combinations of text, such as email addresses or numeric combinations that represent telephone, employee identification, social security or bank account numbers. The program then automatically redacts any instance of PII.

These types of expression-based searches have limitations, however. Depending on the search terms, the program might result in a large number of false positives or, alternatively, if the terms are too specific, then it might not catch all PII. Most tools do not permit users to redact metadata, which is another potential data security risk. Ultimately, manual review will always have a role, but redaction technology can help significantly reduce the time needed to review countless documents.


One potential complement to redaction technology is anonymization technology. Instead of redacting all PII in a dataset, anonymization services can help permanently delete all personal identifiers from a document. For example, a legal team could anonymize employee phone numbers into one single business phone number tied to a specific company, preventing opposing counsel from being able to identify any single employee. Another useful tool is pseudonymization, which still removes all identifying information but retains the links between multiple records pertaining to the same individual.

Like these other technologies, anonymization has drawbacks too.  For instance, if the PII does not match a specific search pattern then the tool could fail to identify it. But anonymization technology still represents another tool in an attorney’s playbook, one that can help reduce expenses and boost efficiency during eDiscovery.

Cross-border data transfers of personally identifiable information (PII) have become more complex since the U.S.-EU Safe Harbor Agreement was replaced by the U.S.-EU Privacy Shield. In today’s regulatory environment, technology tools have emerged as a flexible solution. By reducing the number of documents and redacting and anonymizing PII, lawyers can find a new “safer harbor” for data transfers. Ultimately, every organization needs to have a comprehensive data security policy and review process that leverages all the tools at its disposal.