In the webinar, The Adventures of Multilingual Investigations, Ben Rusch, Vice President of Review Solutions here at Consilio, speaks on multilingual investigations increasing in importance as globalization, international e-commerce and technological advancements continue to expand. Whether it be emails, instant messaging or social media, technology has allowed companies to uncover evidence like they have never been able to before.
One of the biggest questions our presenter is asked is, “Why don’t we just use technology?” “We do use analytics,” responds Rusch, “but multilingual work is typically so complex that you need to bring human expertise to bear as well as technology in order to yield the best results.” Integrating sophisticated technology and smart decision making at the human level is the challenge for sophisticated eDiscovery processes, like Consilio Complete. Investigations can be complex and complicated so it is important to highlight the best practices for success:
Language detection technology is inaccurate
Imagine conducting a multilingual investigation and receiving a hard drive with 100,000 documents in various languages. The instant messages and emails are then scanned and the results reveal an overwhelming number of inconsistent languages. Unfortunately, these results are not atypical as language detection technology often produces incorrect data. One solution is to run Optical Character Recognition (OCR) to differentiate character scripts like Thai from Japanese. These unexpected complexities do not mean investigations cannot continue, but rather that it requires highly experienced individuals to formulate a new strategy.
Refine keyword searches to include abbreviations and synonyms
Any concept can be expressed in different ways in the data we search. That is why one of the things we do intuitively when searching for relevant information is to add some very common synonyms, not only looking for the concept “Department of Justice” but also for its abbreviation “DOJ” in a given dataset, to give an example. One might even look for the search term “regulator” where the context of the email suggests that the Department of Justice is what is meant when someone writes “regulator” in an email. The same process of including abbreviations and synonyms is at play when transposing search terms into another language. We do not merely translate the terms 1:1 to find the closest equivalent. Rather, we analyze the concepts in the original search and transpose them into the target language, including common abbreviations and synonyms
English has an extremely simple way to inflect nouns and verbs. As an example, the plural of nouns is normally achieved by adding an -s, and the past tense stem of the verb “bribed” is the same whether I bribed or you bribed or they bribed. And we do not want to miss out on relevant information just because someone paid bribes in the plural rather than just a single bribe. English grammar is so simple that stemming searches can capture virtually all grammatical inflections of English nouns and verbs. The grammatical inflections of most other languages, however, are significantly more complex. Making sure that all likely grammatical inflections of the terms in the target language have been included in the search is an important step.
The challenge of tokenization
Searching keywords in English involves identifying words separated by spaces. Searching for the term “bee” will not return the word “beer” on the basis that it contains the same string of letters. By contrast, scripts such as the Chinese, Japanese, Korean and Thai scripts do not need separate words (technically, morphemes) with spaces in order to make sense. This presents challenges when running searches because segments of the target term might spell a different word.
Multilingual investigations are complex. To avoid costly mistakes, quality eDiscovery processes, like Consilio Complete, ensure speed, accuracy and overall efficiency through its streamlined capabilities. Ultimately, from machine learning and analytics to skilled multilingual consultants and strong project management, it comes down to finding the right balance between technology and human intervention.