For years, a debate has raged over the benefits of open operating systems and software. Neither, however, matters to me as much as the question of open file formats.
LinuxToday.com blogger Carla Schroeder recently summed up the problem in a nutshell:
"Digital storage is fragile. I'm sure this not news to you. If you have any computer files from the 1990s can you still read them? Are they on a readable medium? In a readable format? It is a chronic problem for businesses, but I think it's a more significant problem for normal, everyday people."
The Problem With Legacy Formats
My own digital archives suffer from the same problem. I have a box of Zip disks that hold backups from a Macintosh G3 desktop system I purchased nearly a decade ago. The Mac and its built-in Zip drive are long gone; the disks, as a result, are useless.
Fortunately, I transferred the most important data -- including years of work-related archives -- to DVD before discarding my old G3. Yet I had overlooked other Zip disks that contain less important, but still potentially interesting, content.
Even if I had transferred those files, however, I would have faced other challenges making use of them. My old Eudora mailbox backups, for example, would have posed a problem. I quit using Eudora years ago, and I'm not sure whether the current version would open my legacy mailbox files -- or, for that matter, whether older versions of Eudora (most of which are still available for download) will operate properly on a modern PC.
I have written off this data, at least for the time being. But I took these lessons to heart when I designed a permanent archive for our digital photo and music libraries.
RAW Deal
Let's consider the photo archive first. Until January 2008, we used a point-and-shoot digital camera that stored photos in JPEG format. So far, so good: While patent trolls have attempted to claim JPEG as their intellectual property, none have succeeded.
In other words, JPEG is likely to survive indefinitely as an archival format. It enjoys near-universal software support and backing from a dedicated, well-organized standards-making body.
In 2008, however, we bought a DSLR camera. Our Pentax K10D, like any similar camera, can store photos in either JPEG or RAW format. It also, however, offers users a choice of RAW formats: Its own proprietary format or the Adobe DNG format.
We didn't have to think twice about which format to use. A properly implemented RAW format is far superior to JPEG for long-term, high-quality digital photo archiving.
What do I mean by "properly implemented?" DSLR vendors have an annoying habit of pushing proprietary RAW formats that don't always behave as they should. Many use non-standard, sometimes poorly documented, file headers that deviate from the underlying TIFF standard. A few DSLR vendors even use proprietary RAW formats that encrypt embedded image tags.
As a result, third-party software developers find themselves playing a game of cat-and-mouse with DSLR vendors hell-bent on maintaining proprietary RAW formats. The idea, apparently, is to keep photographers "loyal" to a particular vendor by ensuring that only the vendor's own software tools can properly access the metadata stored in its RAW format.
This is unacceptable. I can't believe that any serious photographer would allow a camera vendor to pull this type of stunt. Yet they do.
Adobe DNG is not a truly open-source format, and not everyone thinks it makes a suitable RAW standard. Yet DNG is fully documented, it is available to third-party developers on a royalty-free basis, and Adobe is in the process of submitting it to ISO as a completely open standard.
As DSLR vendors go, Pentax is one of the best at offering a relatively transparent proprietary RAW format. And I am reasonably sure that Pentax, unlike some DSLR vendors, will still be in business many years from now.
Keep in mind, however, that we're talking about a digital photo archive that already runs into thousands of images. We need these photos to remain accessible for the rest of our lives -- and even well beyond. I'm not about to trust them to a proprietary RAW format. While Adobe DNG isn't perfect, it is the best current option, and its future prospects are healthy enough to satisfy me.
Face The Music: The Benefits Of FLAC
Our digital audio archive presents a similar set of challenges. Over the past few years, we transferred our entire CD library to hard disk. Today, this music library includes more than 12,000 tracks, and it will continue to grow in the years to come.
When I chose an audio format for our music library, I had two key requirements. First, I required a lossless, audiophile-quality format (nothing says "I don't care" like a drive full of carelessly-ripped MP3 tracks). Second, I required a completely open format that would never run afoul of a vendor's proprietary whims or some patent troll's gold-digging efforts.
As a result, I chose FLAC as our archival audio format. Since FLAC is an open-source file format, it will probably outlive most of the companies that offer competing, proprietary lossless formats. For all we know, that includes both Microsoft and Apple.
The point is, I don't have to care how a particular vendor fares in the market or what it decides to do with its intellectual property. FLAC is completely immune to these trends, and I am as confident as I can be that the music I enjoy today will be just as accessible in 30 or 40 years.
Risks We Can Live With
Digital content still carries a measure of inherent risk. Users can reduce those risks considerably by backing up their data. I back up our digital content archives regularly, and those backups are stored in a water- and fireproof safe. As far as I am concerned, those photos are much better protected than many of our traditional, printed photos -- many of which we cannot replace if anything happens to the original prints.
Of course, digital storage formats will continue to change. IDE hard disks are clearly on their way out, and who knows how much support SATA will enjoy 20 years from now. The same is true of today's optical disk formats; with holographic optical storage now entering the market, it will be up to hardware vendors to decide when or if to ensure that future holographic drives support legacy optical formats.
So we can't eliminate the risk. But we can manage it. And one of the best ways to accomplish this, besides taking data backup seriously, is to think carefully about whether you can really trust your data to a particular file format -- no matter how popular or how well-marketed it happens to be.
Sunday, May 10, 2009
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment
I moderate all comments. Trolls will be hunted down and sold to black-market organ-harvesters.