2009-07-05

Sysadmin Sunday: Guard against file corruption with PAR

Introduction:
Bit rot, File corruption, partial file transfer, call it what you will, digital transmission mediums some times fail and you are left with a corrupted fragment of data if any at all. In the case of large files in which re-transmission would take hours or days, this is a tough situation.

PAR uses a RAID like technique to salvage corrupted files in most cases only needing to obtain files containing restore information that are a fraction of the size of the original file.

This article is intended for people with basic to intermediate understanding of a un*x style operating system.

-=-=-=-=-=-=-=-=-=-=-=-=-=-

Table of contents:
1. PAR and the Reed-Solomon error correction algorithm
2. Available applications based off of PAR
3. Examples
4. Informative resources

-=-=-=-=-=-=-=-=-=-=-=-=-=-
1. PAR and the Reed-Solomon error correction algorithm

The Reed-Solomon algorithm was developed in 1960 by Irving S. Reed and Gustave Solomon. It is used in many technologies such as CD's, BlueRay, DSL Modems, RAID6 and more. This method of error correction is used to protect against certain forms of media defects or data transmition errors.

The PAR utility was developed by Tobias Rieper and Stefan Wehlus for the purpose of recovering corrupted files and file fragments from Usenet posts with out needing to download the file all over again. Later, to compensate for some limitations of PAR, the PAR2 specification was developed by Michael Nahas and Peter Clements. Clements then wrote some of the first PAR2 applications.

A simple way of explaining what PAR does is that it takes the original source files then applies the mathematical algorithm to it which contains a sort of processed description of what that file looks like. Then lets say you send someone a file but for some reason the transmission fails mid way through the file transmission. All that needs to be done is to download the results of the mathematical operation (which are significantly smaller than the original file) and run the par utility to apply the math to the file fragment. Par can fill in the blanks using the algorithm and restore the the file.

2. Available applications based off of PAR
There is of course the fore-mentioned open source application written by Peter Clements et all. There are a slew of other PAR clients for Mac, OS 9 and 10, Windows, Linux, BSD and more. Though the PAR1 specifications are incompatible with the PAR2 specification most clients support both formats side by side. For a detailed list of PAR compliant projects check out the Parchive sourceforge website. If you are using Linux, you can either download a Linux rpm or source tarball from the sourceforge site . Or use a package system such as apt-get to download it from your distributions package archives.

3. Examples
In this example I am using Ubuntu Linux.

  1. This will require the Ubuntu Universe repository. You can uncomment this in "/etc/apt/sources.list" using "sudo vi /etc/apt/sources.list".
  2. Then update your sources using "sudo apt-get update".
  3. Finally get the par2 package using "sudo apt-get install par2" .
Now lets test par2 to see if it can recover a file:
  1. Using dd create a 10MB test data file from /dev/zero "dd if=/dev/zero of=/tmp/testdata.bin bs=1024 count=10240"
  2. Then create our par2 file and recovery blocks: "par2 create testdata.par2 testdata.bin"
  3. Now im going to copy the original data to a different name then make some changes to it.
  4. Then I run "par2 verify testdata.par2 testdata.bin"
  5. par2 tells me that I need one recovery block to repair the file. (* during the create process par2 created several repair blocks. Since par2 over-samples, I can use the either the largest repair file or a combination of the smaller files for the same effect.) In this case I just need to have the repair block file called testdata.vol000+01.par2 in the same directory.
  6. I then type in "par2 repair testdata.par2 testdata.bin" where it then reports that the file has been repaired.
4. Informative resources
Clements,Peter Gallagher,Ryan Nahas,Mike et. all. "Parity Archive Volume Set: File
Specification, Clients, and Related Resources" (Accessed July 2009)
http://parchive.sourceforge.net/

Wikipedia.org "Reed-Solomon Error Correction" (Accessed July 2009).
http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction

Wikipedia.org "Parchive" (Accessed July 2009).
http://en.wikipedia.org/wiki/Parchive

blog comments powered by Disqus