Practical advice regarding backup data in relation to electronic disclosure
By Dr. Tristan Jenkinson and Emma Young
Every year on 31 March, in order to avoid looking like an April fool and losing your data on the following day, the world is encouraged to back up its data on “World Backup Day”.
Sadly, it seems that World Backup Day has come too late for social media site MySpace, who recently admitted that issues during a server migration resulted in the mass loss of data, with 12 years’ worth of data, uploaded between 2003 and 2015—all gone.
While businesses (hopefully) create backups to avoid such data losses, it is not the only time that backups can be of use.
When litigation or an investigation becomes a matter of title importance, backups frequently become immensely significant. However, there are practical considerations to take into account before offering to recover “all the data” from backups.
Types of Backup Media
When businesses create backups, there are several ways that this can be done. For many corporates, this is the use of backup tapes. Backup tapes were chosen historically because they were a very inexpensive means of storing large amounts of data. LTO-8 tapes, for example, can contain around 30 terabytes of data on a single tape or about 6600 HD movies amount of data, while the LTO-6 format of backup tape can store over 6 terabytes of data, at a cost of about £25 per tape.
While tapes are designed to store data for a very long time there are some risk factors such as the age of the tape, the environment in which it has been stored and how many times the tape has been used in the past.
Other media formats such as hard drives, solid state drives, and even cloud storage are now becoming more common, however in our experience, in disclosure situations, it is predominantly backup tapes that are considered. With that in mind, we focus for the most part on tape backups in this article, though many of the points apply regardless of the media.
Backup Processes– Full vs. Incremental vs. Differential
In addition to different types of backup media, there are also different types of backup processes. The most common of these are full, incremental and differential.
Full backups contain all of the information, so once restored, there is nothing further to do.
Incremental backups contain only the information that has changed since the last backup (full or incremental). This means that to recover the files from an incremental backup, you would also need the corresponding full backup and other previous incremental backups. For example, if you complete a full backup on Monday, followed by incremental backups on Tuesday and Wednesday, then to restore the data from Wednesday you would need to restore Monday’s full backup, then apply the incremental backup from Tuesday, then apply the incremental backup from Wednesday.
A differential backup contains all changes made since the last full backup. This means that to restore data from a differential backup, you only need the corresponding full backup and the differential backup itself. So if you have a full backup on Monday, then differentials on Tuesday and Wednesday, to restore the data from Wednesday’s backup, you would restore the full backup from Monday and then apply only the differential backup from Wednesday.
The type of backup process used can, therefore, be key when considering which backups you may need to restore to recover the relevant data.
It is also worth noting that backups will not necessarily be stored on a single tape, a backup set may use many tapes. In such cases, you may require all of the tapes from that backup set in order to successfully restore any of the data from that backup.
Backup Tapes and Proportionality
Backup tapes have a reputation for being a notoriously challenging data source in an eDiscovery context. There can certainly be challenges but it would not be accurate to say that just because data exists on a backup tape, that this data should automatically be regarded as disproportionate to disclose in the context of a legal dispute.
Some of the challenges when considering the recovery of data from tape are;
The amount of data on the tape(s) – as discussed earlier, each tape can contain significant volumes of data. If restoring multiple backups the amount of data could complicate the approach, or be an expensive option.
Encrypted data – if historic data held on backups has been encrypted, and the encryption key has been lost, it could be difficult, or impossible, to restore the data.
Largely duplicative – A full backup which is performed a month after the previous backup would contain much of the same data. The main differences being any data that has since been deleted would not be included on the new backup, whereas updated files and any new files would be. For this reason, restoring data from multiple backups can require a large amount of deduplication, potentially driving up data volumes and cost. While there are methods that can be used to identify and remove or exclude duplicates, this is something that may need to be considered.
Data Privacy – As backups can contain a full snapshot of a company email or server system at a historic point in time, there can be confidentiality and privacy concerns, for example regarding the data of employees who have since left the company.
Hardware and Software – If backups are historic, the company may no longer have the original tape drives used to write the tapes, or the original software. Whilst the original software may not necessarily be required to extract data from the tapes, it can add to the complexity of data recovery.
Organisation of the backups – If looking for a backup from a specific date or time frame, can these tapes be easily identified? Good backup practices
Good Backup Practices
The details of good backup practices will, of course, vary on a case by case basis. However, in our experience, good back up organization usually includes certain best practices, such as:
There can be nothing worse when looking for a set of 4 tapes to find that you have a box (or cupboard) containing hundreds of tapes, none of which has been labelled. One of the unfortunate consequences of this is that it may require each tape being individually scanned (or catalogued) to find out what it contains, just to identify the tapes that you need to restore data from. This can then have a significant impact on cost and timings.
Tapes stored in a logical order
If you have to identify multiple tapes, it can save a significant amount of time if it is easy to locate the tapes that you need. This is especially the case if the tapes have been boxed and placed in long term storage so that you can identify which boxes to request from storage.
An up to date log
In addition to labelling, you should keep a log of what is contained on each tape. The log should include information such as which tapes were used on a backup set, the dates of the backup, the data being backed up, the type of backup (full or incremental), the location of the tape (for example details of long-term storage).
Details of encryption
Ensure you keep a (secure) log of any encryption in place so that this can be kept historically, and that this information is accessible by more than one person.
The Need for a Strategic Approach to Backup Tape Data
With consideration to proportionality in an eDiscovery case involving data from backups, it can be advantageous to develop a strategy for how backup data should be approached. It may be helpful to agree the approach with the other side, but even if not, by documenting a sensible approach, you can defend your actions should they be called into question by the other side. You may already have such a strategy in place if you are litigating in the UK under the new disclosure pilot rules (Practice Direction 51U) in the Business and Property Courts of England and Wales. Depending on the strategy, you may select specific tapes (or backup sets) to restore, rather than having to restore all data from every single tape.
When developing a strategy, the key points to consider are;
What tapes are available? If only a limited set of tapes are available, this may simplify the decision.
Is there a specific relevant event? If there are concerns that data may have been lost or deleted on a specific date, then you may consider the last backup prior to that event. On the other hand, if there is a specific event that you are looking for information about, you may want to look at the first backup after the event.
The above withstanding, looking at earlier data may miss information that was created after the date of the backup. Conversely, looking at data too late may mean that the data has been updated, or deleted and may not be available.
Data volumes. As discussed earlier, looking at multiple backup sets can result in large amounts of deduplication which can have an impact on cost and timings.
The different strategies which can be used, may include;
Restoring all data from every backup
Restore only the year-end backups over a period of time
Restore tapes from selected time periods.
An alternative approach, should the other side call for a significant amount of backup tapes to be restored, maybe to restore one backup set and identify the material that was identified through that backup set which was not available elsewhere. This means that the costs of the restoration can be recorded, along with the unique data that was returned from that process. It may be possible to then extrapolate this to indicate costs and an expected unique amount of data for each set to assist in a determination of a proportionate approach.
A Cautionary Word – Backups Are Not Always the Solution
You should also consider however that the data which you are looking for may not be available on tapes. If data is not present on any backups, this does not necessarily mean that it did not exist. If, for example, data was created and deleted between backups, then it is possible that this data was never stored on backup media. While they can be very useful, backups are not always a silver bullet to ensure that any historic data can be restored.