Thursday, October 6, 2011

The UK Web Archive

The UK Web Archive began archiving web sites in 2004. Based at the British Library, our aim is to collect, archive, and make accessible web-based resources of scholarly & cultural importance from the UK domain. Why do we do this? Because, put simply, the web is a unique record of modern life and experiences. If we don’t archive it – or at least parts of it – then there will be a black hole, a massive gap in our digital cultural memory.

There are around 9,000 different titles in the UK Web Archive, representing and capturing content from a broad swathe of society. The web archive is selective and currently provides a representative sample rather than a complete capture of the UK domain. Most of the sites in the web archive have been gathered several times on different occasions, thus capturing changes to the sites over time as well as preserving sites that are no longer available on the live web.

New titles are added to the archive after nomination by subject specialists in or outside of the library, or by members of the public. If a nomination is in scope of our collection policy, we first request permission from the site owner to archive their web site. Once permission has been granted, our web archivist enters the details into our web archiving system (the Web Curator Tool) and sets up a crawling schedule (i.e. to capture the site every month, or every six months, or once a year etc). When the crawl is finished and we have a copy of the site, it goes through a quality assurance process to ensure it is complete and all expected aspects work as intended, before it is approved and eventually becomes accessible in the web archive.

Once in the web archive, content can be explored in several different ways. Users can browse by subject or by special collection. Traditional search options are of course possible, both full text and title based, but we are also developing a number of visualisation tools that help users explore content in uniquely digital ways. For example, we provide an NGram search option, as well as a 3D Visualisation wall. We also provide tag clouds for content in our 2005 General Election special collection. Other visualisation and analytical approaches that allow users to exploit the data contained in the web archive, rather than just experience the archived web pages, are under development for future release.

The Web Archive started life as a collaborative initiative between six major institutions, known as the UK Web Archiving Consortium (UKWAC). Though the consortium was dissolved in 2009, collaboration is still a key theme at the UK Web Archive and we continue to work closely with a number of former UKWAC partners, as well as the International Internet Preservation Consortium (IIPC) and a number of other institutions spread out across the UK. The web archiving challenge is unparalleled in scale, and collaboration is key to success. If you’d like to get involved, we’d be delighted to hear from you - you can contact us via our web site, or via twitter. We’re especially interested to hear from anyone interested in participating in future crowd sourcing projects, so if that’s you, please do drop us a line.

No comments:

Post a Comment