The Deal With Audio Data
The tech boom has directly contributed to the massive increase in audio data. This audio can be sourced from various devices, from smartwatches to smart homes to the obvious mobile devices. Technological developments directly resulted in improvements in communication. These improvements have many benefits, including more interconnectedness and lower costs. However, several problems arise from the sheer amount of audio data collected from these various devices.
As audio data is stored on various mediums and able to be used after conversion and translation, it is definitely considered ESI. For litigation purposes in certain circumstances, producing audio data can be compulsory.
Another facet in the audio discovery process is the management of huge data volumes of audio files. During the investigative process, it is necessary to preserve all information that may be pertinent to potential litigation. However, everything deemed potentially-relevant may translate to terabytes and terabytes of data.
Without the proper protocols, combing through hours and hours of audio will take forever. In many cases, you can be given an extremely tight time frame of producing data. Weeding through the audio to determine exact relevant snippets for review is not ideal. You must implement measures to quickly detect relevant data; however, these often require extra effort and configuration.
Devices with the ability to record and collected audio information are everywhere. From software to record business meetings to voicemail, audio sources always have the potential to produce discoverable information. So, what makes connecting one’s watch to their thermostat to the garage door difficult? On the consumer-facing side, microphone-enabled devices are convenient, but on the discovery side, preservation, retention, and discovery can be a headache.
For example, throughout the day a smart home is collecting and processing audio. Typically, users speak directly to the device a few times throughout the day, which leaves lengthy portions of non-talk audio. Additionally, a user may communicate with their thermostat a couple of times a day, but in litigation, this time, detail may be significant.
Spending vital time on non-talk portions of recordings takes away from the overall efficiency of the process. To maximize productivity and minimize cost, systems to detect talk portions of audio must be implemented.
On a Global Scale
As companies operate increasingly globally, there are several factors that must be accounted for. Language variation in traditional eDiscovery and document review can be problematic, but these issues are further compounded by audio data. The latest Google Home is available in 20 different countries in 12 different languages. In each location, there will be differences in dialects meaning disparities between pronunciation, syntax, and vocabulary. Finding experts for discovery and review processes, who can decipher talk portions, dialects, and the meaning behind audio data can be a daunting task.
Depending on the location, specific platforms and mediums are used more often than others. For example, WeChat has more than one billion monthly users in China, while around only 2.5 million in the United States. In 2017, Chinese WeChat users were sending 8.1 billion voice messages a day. These statistics implore the vast amount of discoverable audio data that is out there from a single chat platform.
Audio data cannot be overlooked as it falls under the umbrella of ESI. Implementing processes and protocols to streamline the process is necessary. Without proper controls, the time spent on discovery will increase dramatically, consequentially driving up the overall costs.