Data Backup Systems

article-image

An understanding of data backup systems is important for locating responsive electronic data. Three backup strategies are available to data administrators. The full backup, as the name implies, backs up all system data. However, even full backups do not commonly involve the backup of all data. Application files (such as the files which compromise Microsoft Word) are not usually backed up since they can be reinstalled using the application CDs. Instead, only configuration files and actual data files created by the users are backed up. A system restoration requires only the most recent full backup. However, full backups take the most time and media space. The differential backup saves files which have changed since the last full backup. The last full backup and the most recent differential backup are required to restore the system. The incremental backup saves only those files which have changed since the last incremental backup. It requires the least amount of backup time and space, but all incremental backups since the last full backup are required to restore the system. The size of the system may require that incremental backups be performed during the week and full backups on the weekend. The difference between the different types of backup strategies is illustrated below:

Assume that Files A through E currently exist on the server. Assume also that on each day, the designated file changes. Assume also in the differential backup example, that no intervening full backup takes place. The files would be stored as follows:

Back Up Table

Emails are generally stored in a single database. For example, Microsoft Outlook files are contained within a file called Outlook.pst. Thus, the receipt or transmission of any e-mail changes the file, and therefore e-mail databases should be picked up by any type of backup. However, for other types of files which are not regularly changed, the practitioner should be aware of the type of backup protocol used in order to determine from where, or if, the file can be retrieved.

Tape media are commonly utilized for backup purposes. A rotation system is used so that backups may be stored offsite to protect the media in the event of a disaster. Tapes are generally saved on a daily, weekly and monthly basis. The daily tapes are frequently recycled, reducing the total number of tapes required while allowing retention of historical data. One common strategy is the so called “grandfather, father, son” strategy. This is the strategy described by Judge Scheindlin as being used by UBS in Zubulake I. Daily tapes are used from Monday through Thursday and recycled each week. Weekly tapes are saved each Friday and recycled each month. Monthly tapes are saved on the last Friday of each month and may be saved for a year. Graphically, it looks like the following:

Back Up Tape Schedule

In Zubulake I, UBS modified the above protocol: “Nightly backup tapes were kept for twenty working days, weekly tapes for one year, and monthly tapes for three years. After the relevant time period elapsed, the tapes were recycled. Zubulake I, supra, 217 F.R.D. at 314. Thus, in addition to the backup strategy employed (full, differential or incremental), the practitioner should understand the tape rotation schedule in order to locate relevant documents.

Tape-based backup systems, while offering the advantages of high capacity and lower cost, store data in a sequential fashion, making access to the data difficult. Judge Scheindlin, in her discussion of accessible v. inaccessible data, classifies tape backups as inaccessible because of the sequential storage of data on the tapes, “which means that to read any particular block of data, you need to read all the preceding blocks.” Zubulake I, supra, 217 F.R.D. at 319. As a result, the data on the tape is organized according to the computer’s file structure, rather than a “human records management structure.” Furthermore, data is usually compressed when written to the tape, requiring a decompression process to restore it. Id. Judge Scheindlin also noted that there is no “uniform standard governing data compression.” Id. Therefore, if the tapes were made using an older software package, if that package is no longer available, the tapes may be rendered useless.

A newer trend in backup technology is known as “content addressed storage” or CAS. The storage is disk-based, and is designed to be used with data which stays stable over time. See generally Stephen Bigelow, Content Based Storage: An Overview, Jan. 12, 2006, on SearchStorage.com. A file is associated with metadata describing the file and its content, and is assigned a unique value based on the characteristics of the file itself. This unique value prevents duplicates from being stored, and the indexing system created from the metadata allows the file to be easily retrieved. Bigelow uses the example of multiple e-mails sharing the same attachment. Only one copy of the attachment would be saved, while the metadata with each e-mail would contain a pointer to the attachment. Bigelow also observes that CAS systems of the future may allow for searching among multiple applications, if, as he predicts, “CAS vendors embrace a standard interface method.” Stephen Bigelow, Content Based Storage: Future Directions, Jan. 12, 2006, on SearchStorage.com. Development of this technology would undoubtedly result in reclassification of such data from inaccessible to accessible.