I have been thinking about the concept of data longevity and data rot lately after my sister asked me a very poignant question about her blog; “will my blog be around in 20 years?”. I never really doubted that her blog would be around in 20 years but it got me thinking.
I certainly put more faith in a blogging platform and website vs. a standard paper and pen approach simply because the bits saved in my hosting account are far more likely to be around in 20 years than a notebook or journal. But how about 50 years? Or 100?
After looking into this a bit further her concern is not too far off. The idea of data loss due to time is a very real concern. As the shelf life of early CDROMs was a mere 40 years before bit loss starts to occur it begs the question; how long is our data safe for?
A recent story about JournalSpace losing all its data got me thinking about all the web services now storing your data. Social networks, blogging platforms, photo storage sites, and even backup sites are now becoming very common all promising safe storage in the “cloud”. This does not mean much for those people that have lost everything due to a catastrophic data loss.
Even some of the most important scientific data, the moon landing tapes for example, are suffering from the same problems.
So what is the solution? What steps can we take to ensure data availability of important scientific information? Baby photos? Personal blogs?
These questions are all being addressed, but I wonder at what cost.
The JournalSpace.com example mentioned above denotes a sociology problem as well. The loss of this data represents human knowledge that was collected in one place that is wiped out. While no one can judge the importance of this data – it is now gone. As I look to backup services and data replication systems that spread my bits across my machines and the web I hope we get closer to answering my questions above instead of simply creating bigger HDDs and hoping for the best.