Saving our digital heritage
It is commonly agreed that the destruction of the ancient Library of Alexandria in Egypt was one of the most devastating losses of knowledge in all of civilization. Today, however, the digital information that drives our world and powers our economy is in many ways more susceptible to loss than the papyrus and parchment at Alexandria.
An estimated 44 percent of websites that existed in 1998 vanished without a trace within just one year. The average life span of a website is only 44 to 75 days. The gadgets that inform our lives - cellphones, computers, iPods, DVDs, memory cards - are filled with digital content. Yet the lifetime of these media is discouragingly short. Data on 51/4-inch floppies may already be lost forever; this format, so pervasive only a decade ago, can't be read by the latest generation of computers. Changing file and hardware formats, or computer viruses and hard-drive crashes, can render years of creativity inaccessible.
By contrast, the Library of Congress has in its care millions of printed works, some on stone or animal skin, that have survived for centuries. The challenges underlying digital preservation led Congress in 2000 to appropriate $100 million for the Library of Congress to lead the National Digital Information Infrastructure and Preservation Program, a growing partnership of 67 organizations charged with preserving and making accessible "born digital" information for current and future generations.
Some of the crucial efforts funded by the program include the archiving of important websites such as those covering federal elections and Hurricane Katrina; public health, geospatial and map data; public television and foreign news broadcasts; and other vital born-digital content.
Unfortunately, the program is threatened. In February, Congress passed and the president signed legislation rescinding $47 million of the program's approved funding. This jeopardizes an additional $37 million in matching, non-federal funds that partners would contribute as in-kind donations.
Some of the projects that were to be funded include preservation of important government records at the state level, such as legislative data and court records. Another new project at risk, "Preserving Creative America," is an initiative with commercial producers of creative content, such as digital film, music, photography, other forms of pictorial art and even video games.
We have seen what happens when valuable public data are inadequately preserved, lost or not available when needed. For example, the original, raw data from the 1960 Census were stored on a state-of-the-art UNIVAC computer. When the Census Bureau turned the data over to the National Archives in the mid-1970s, UNIVAC computers were long obsolete. Much of the information was eventually recovered, but at a huge cost. Raw data from early satellite probes, including the Viking mission to Mars, pre-1979 Landsat images of Earth and high-resolution images of the moon, have been lost for similar reasons.