Accelerating Science Discovery - Join the Discussion

Dark Archives

by Mark Martin on Thu, 4 Aug, 2011

I have to admit that I am truly a science fiction and fantasy geek.  Blame it on growing up on a steady diet of Star Wars and Transformers.  This bit of background information helps explain why I smile internally whenever I get the chance to talk about dark archives.  Those words call to mind a picture of some mysterious, powerful object at the center of an epic story, like The Lord of the Rings.  Great words.

The reality is that in the Information Management industry, a dark archive isn’t quite so epic. But I do think that my choice of adjectives, mysterious and powerful, is still quite appropriate.
 
Mysterious, yes.  Dark archives are certainly misunderstood both inside and outside the industry.  So, what is a dark archive?  It is, simply put, an archive of information that is not used for public access.  Most often it serves as a failsafe copy of a light archive, i.e. a publicly available version of the information, for use in disaster recovery operations.  Dark archives need not be a fully operational copy of an information system, rather just the content behind the information system.  This is an important distinction because maintaining an exact operational copy of an information system is a much more complex and expensive undertaking than maintaining only the content the information system operates on.  Metaphorically, at its base definition, a dark archive will require more than a flip of the switch to make a light archive.
 
Powerful, no doubt about it.  OSTI currently operates a dark archive for the collection of technical reports for those documents that have been announced to OSTI but which are hosted by the National Laboratories.  The technical reports in our dark archive number almost 100,000 and include the digital equivalent of 6 million pages of text, charts, graphs, and photos. Having the full text locally accessible to our information systems allows us to execute critically important processes, such as fully indexing the content of these technical reports.  Fully indexing all of the text in each report in turn allows our information products to provide a more precise search and more relevant results.  This is important to you if you need to find R&D information fast. It is true we could implement these processes without maintaining a dark archive, but, because of the dark archive, the ease with which we can provide this functionality is astounding.  
 
I have to point out how easy the dark archive was to bring online here at OSTI and to operationally maintain.  OSTI maintains a large data warehouse that houses the results from the Department’s research.  Our dark archive was a feature we tacked on the existing infrastructure of our data warehouse.  As new records enter our system, if the documents are being hosted offsite, our dark archive will automatically connect to the remote location, then download, cache, and index the remote file.  This entire process requires no human interaction at the offsite location nor at OSTI.  As a rough estimate, of the total amount of funds that OSTI spends to manage technical reports, the dark archive requires about 1%.  
 
Dark archives probably aren’t quite as awesome as some may picture them, but they are extremely powerful tools in the information management world.  Dark archives, a really great name and a wonderful tool.
 
Other Related Topics: archives, dark, Star Wars

Comments

About the Author

Mark Martin's picture
Mark Martin
OSTI Assistant Director, Program Integration