Bitrot

Simson Garfinkel writes about bitrot saying, in so many words, it won’t be that big of a problem. Jeremy Hedley warns, Cassandra-like, that for invidivuals, it might be pretty bad indeed.

I side with Garfinkel.

Like pretty much everyone who has been using a computer for more than 15 minutes, I’ve lost data. The problem of bitrot is one that is pretty widely recognized by now, even if we’re not sure exactly how best to guard against it. This awareness in itself is probably going to help minimize the problem: we may look back on the period from, say, the fifties to the nineties as an anomoly when we didn’t routinely plan on making data available to our future selves.

Bitrot is a three-layered problem:

The physical layer
If you can’t read a floppy, or whatever physical medium you’re using, you are sunk. This really breaks down into a couple sub-layers: the media itself has degraded (all media has a lifespan before it starts losing data; for some, like floppies, it’s pretty short); or the drive requires a connector and/or software drivers you can’t use with any known device.
The data layer
Fine, so by some chance your floppy is still good, but back in 1993 you were using MS Works 2 to store your business data, and there aren’t any programs that can read those files.
The cultural layer
This ties in with the data layer–some formats will almost certainly be well supported in the future, at least to the extent that format translators will exist to convert Ye Olde Data Phyle into the sleek and modern DataFile 3000. This comes down to how popular a format is/was, and whether it is clearly and publicly specified. The file format used by Word 2000, for example, is not publicly specified but is so widely used that a number of programmers have done pretty good jobs of reverse-engineering it. The PDF and RTF formats are publicly specified and very widely used. But MS Works 2? Nope.

So what can we do to avoid the heartbreak of bitrot in our own lives? A few things.

Back up
This should be obvious. My own backup strategy is to back up my home folder to an external hard drive daily, and to a magneto-optical disk (estimated to have 50-year data integrity) weekly.
Save files in publicly specified formats
As I wrote to a friend recently, “every time I save a file in Word format, I’m afraid I’m doing something that will come back to haunt me.” From now on, I’m saving my work as RTF. Plain text would be better, but RTF strikes a balance between preserving formatting and universality.
Move forward
This does not mean jumping on the bleeding edge and buying every gadget that comes along. It means recognizing when a physical or data format is on the way out, finding a safe successor, and moving to that. As long as you’ve got data you can read on a hard drive that works with your computer, and a backup you can read somewhere else, you should be in the clear indefinitely. Eventually we will see net-based storage that is convenient and affordable (we’re not quite there yet), and at that point, we won’t have any excuse for failures at the physical layer.