Day of Digital Archives

Monday, October 15, 2012

Late Breaking...

I just wanted to post two more entries in D0DA this year that I didn't see until after work on Friday:

Day of Digital Archives by Bonnie Weddle at L'Archivista
Experimenting in the Archives by the Schlesinger Library at the Radcliffe Institute at Harvard University

I'm posting them a bit late, but Vive la Day of Digital Archives, right?!?!!

Friday, October 12, 2012

Cloud storage and digital archives

Hello everyone! Since I am new here, I better introduce myself, but briefly. My name is Sarah Kim. I am a PhD student at the School of Information, the University of Texas at Austin. For several years, I have been exploring people’s everyday digital record-keeping practices. Personal digital archiving as a form of long-term digital record-keeping and value-determination practice is the phenomenon that I particularly focus on in my research. I have been asking people how they live with their digital documents. Today, I would like to share one of the questions that I have been thinking of based on my (personal) digital archiving research: Cloud storage and digital archives.

During the interview, I asked participants what they would pick first to rescue if there were a fire at their home, besides living things. Many participants mentioned an external hard drive or a personal computer. Although their answers may be influenced by other record-keeping related interview questions, if someone asks me the same question, I would think of taking my external hard drive that functions as my own digital archives (not as mere back-up storage). Digital documents (including pictures) stored on that device are vital for me to rebuild and continue my life.

Recently, many IT companies are offering cloud computing services as massive data storage. IT researchers and practitioners often call cloud computing a new paradigm for computing. In fact, many people are already using cloud services to conduct their work and/or non-work related activities (e.g., Gmail, Google Docs, Dropbox and many others). Cloud storage has a potential as a future platform for personal digital archives (as well as digital storage of memory institutions — Interesting survey results of National Digital Stewardship Alliance member preservation storage systems: http://blogs.loc.gov/digitalpreservation/2012/01/partly-cloudy-trends-in-distributed-and-remote-preservation-storage-more-results-from-the-ndsa-storage-survey/).

This makes me curious about how my participants’ answers will change once they actively start using cloud storage to keep and preserve their personal digital documents and furthermore how our digital archiving practices will change with the new technology?

(Well-known New Yorker Cartoon by Mick Stevens, Published November 21, 2011)

It is highly likely that more people (and memory institutions) will be interested in using cloud storage as their digital archives considering benefits associated with it and the overall trend in IT industry. Cloud storage, (expected to be) maintained and monitored by IT experts, could be a relatively more secure place, considering the technical vulnerability of more conventional digital storage media that many people are using such as external hard drive. Cloud storage services offer other useful functions such as sharing documents with others, synchronizing digital materials between different devices, tagging, and so forth. Also, cloud computing is still in early stages of development in general.

There are, however, many questions to ask to clear the cloud of cloud computing. For example, concerns for privacy (We are very familiar with horror stories about data hacking, personal information selling, identity theft and so forth) and feeling of losing control over their personal documents (Who owns what on the Web?), and building trust between users and services providers (How much can we trust the work ethics and long-term sustainability of these commercial services?) remain vital issues that we need to think of.

From an archival perspective, I think, memory institutions’ (especially archives) working experiences with cloud storage service providers can offer a great insight into how we can inject archival thinking (e.g., what archives means, values of documents, and so forth) and practices in the design and development of these services.

Thank you for reading.
Sarah Kim
(Personal digital archives research blog: http://personaldigitalarchives.blogspot.com/)

The William Blake Archive on Day of Digital Archives

Make sure you read the posts over at The Cynic Sang: The (Un)Official Blog of the William Blake Archive:

Day of Digital Archives: Artist's Collections

Well, this year's Day of Digital Archives ha been much more successful for me than last year (I broke my elbow in a cycling accident that day and spent much of it loopy because of the pain pills. I did type out a one-handed blog post but I don't think it ended up being coherent.) This year I want to talk about an artist's collection we've been working on for a bit at UO.

The Tee A. Corinne papers are one of the many hybrid collections we have in Special Collections and University Archives at the University of Oregon. Tee Corinne was a lesbian visual artist, writer, and activist who explored female sexuality in her visual and written works. Upon her death in 2006 she left her entire estate, including the rights to her literary and artistic works, to the University of Oregon Libraries. Owning the rights is nice because, once we've done our initial processing and preservation work on the files, we don't have to worry about any rights issues when providing access to the digital objects.

However, before we can even start worrying about access to the materials we've had to devise a plan for working with the digital records. When UO received Tee's collection in 2006, it included a laptop and a desktop computer as well as removable media containing various works and papers. At that time, the UO did not have well-developed procedures of workflows in place for ingesting or otherwise processing digital objects. The files were pulled off Tee's computers and the various media and moved over to library servers, but nothing else happened to them for a number of years. In the meantime, there was a gap of more than a year between the time my predecessor (the first e-records archivist at UO) left and the time I was hired. The Tee Corinne e-records were left on the servers and until now I haven't been able to work with them at all.

When I started my initial assessment of the digital portion of Tee's papers, my first task was to try to gather all the digital objects from the collection into one place on the server. Because of the lack of workflows when the collection was taken in, the digital objects ended up in a number of different places on the server. Although I think I've managed to round up most of them now, I still run across stray files that have to be added in with the others. When we started the project this summer, we identified 65,328 digital files we knew came from Tee's computers or from the removable media in her collection. Although I would love to be able to declare that all those files in fact belong in Tee's collection, she shared her computers with Beverly Brown, her lover, whose collection the UO also owns. In addition, Bev Brown was the founder of and was heavily involved with the Jefferson Center, an organization whose records the UO holds as well. Once we started looking at the files from Tee's computers, we realized that her files, Bev's files, and files from the Jefferson Center were all mixed together. The organic file structure the women were using did not clearly distinguish among these three separate groups. Often a single directory will contain files from all three collections. This has slowed down our processing: we're trying to develop some content-based filters so we can do some batch sorting of the files. Most of the textual documents were created in version of WordPerfect, so we're also working on batch converting those files. In addition, of course, we're having to do a lot of renaming so that the file names of the preservation copies don't have any of the potential trip-ups you see in organically-named files.

The most interesting challenge in dealing with this collection, however, has been the photographs. Photography was one of the many media in which Tee worked, and she made extensive use of Photoshop. Sometimes she created prints of several digitally-altered versions of a single photograph; we are often able to match physical prints with digital files, but in some cases we have digital photographs for which no physical print exists or vice versa. Tee also tended to revise her photographic series depending on the context in which she was exhibiting or publishing them. This means we sometimes have several different series of a single image or group of images. The series may or may not be consistent; that is, sometimes a series of images was published in one form in on place and in a different form somewhere else. In the digital files, this means that in some cases we have many duplicate copies of a single image (if Tee organized the files based on the various publications) as well as multiple different versions of an image. We would prefer not to transfer multiple copies of a single image onto our preservation servers, but we do want to preserve the different versions of the images because we feel these are an important artistic statement. Sorting out the files themselves has proved to be an enormous challenge, however. Luckily I have a team of graduate students and volunteers who are working hard on this (as well as other) projects.

What have I learned from my work with this collection so far? Obviously, documentation is a hugely important factor when you're talking about a born-digital collection. One of my main problems right now is the lack of documentation from previous work that occurred with this collection (however cursory that work might have been). I'm trying to document every step I take with these records so that my successors have a clear picture of what has and hasn't been done with the materials. It's also important for the digital archivist to be involved in the donation process if at all possible; this helps lessen the amount of triage work you have to do when the born-digital records arrive on your doorstep.

Day 29

Today was my 29^th day of work as the first Records Management Archivist at Johns Hopkins University. My job encompasses two areas that overlap frequently but not perfectly: management of university records and management of born-digital archival materials, regardless of whether they originate within the university or with external donors. This combination of roles is relatively common in our profession, but I haven’t personally experienced it long enough to evaluate it critically; perhaps that will be a topic for next year’s Day of Digital Archives post.

My first 6 weeks have coincided with the processes of annual reviews and setting individual goals in our library. Although I was initially wary of having to set annual goals so early in my tenure, the timing has been fortuitous because I have been planning my activities for the next 12 months – which I would be doing at this point in a new job anyhow – at a time when my colleagues are all thinking similarly. And if there’s one thing that I can say with certainty about the next year, it’s that it will involve a lot of collaboration: with other archivists, with curators, with developers, with metadata specialists and with project managers, just to name a few.

For the next few months, I will be assessing the current state of our institutional climate, our capacities and our collections as they relate to acquiring, preserving and providing access to born-digital archival materials. Next, I will be working with my colleagues to determine what capacities we want to develop as an organization. Do we want to do forensic captures of media-based accessions? What kind of preservation activities do we want to undertake? What types of functionalities do we want to build into our digital repository? How do we want to provide access to our materials? Although there will be many details left unanswered at this stage, I hope to be able to address these and similar questions at a very high level within the next six months.

Finally, I will spend the rest of the year developing a three-year road map for how we can get from where we are to where we want to be – or at least, from where we are to moving purposefully and surely toward where we want to be. This will involve identifying gaps in our current technological and human capital, and proposing ways to bridge them.

Of course, the day-to-day activities of the archives will not stop for a year while I figure all this out. Prior to my arrival, no one in our department was charged with focusing to this degree on all the issues surrounding born-digital materials. However, like many institutions, we had still been acquiring them for some time. So while I am doing high-level analysis and planning, I will also be carrying out the day-to-day activities of accessioning and caring for our materials as best I can with the resources currently available.

I have already made a few changes that bring our activities more in line with, for example, the minimal levels of digital preservation outlined in a recent proposal from NSDA. Specifically, I have instituted the use of LOC’s Bagger tool to generate file manifests and fixity information according to the Bagit specification at the time of acquisition, and I am working with library systems to transfer our current holdings to a storage space where they can be more appropriately managed.

However, I don’t anticipate many other changes in our procedures in the next year. This means that for the next 12 months, we will undoubtedly continue to do some things in ways that I know could be improved. However, when I do begin to make radical changes in our procedures, they will be guided both by best practices and by our own organizational needs and goals.

New perspectives can translate to new opportunities

I’m the Digital Collections Archivist at Kennesaw State University in Kennesaw, Georgia. Kennesaw State is the third largest public university in the University System of Georgia, with a current enrollment of 23,103 for the Spring 2012 semester. Founded in 2004, the Archives consists of one full-time Archivist (me), a part-time Archivist, who also works half-time in the Bentley Rare Book Gallery, and an Associate Director. From 2004 to 2008, when I was hired, the Archives was staffed solely by the Associate Director. For the 2011 Day of Digital Archives, I created a photo essay to illustrate the different roles and responsibilities in my position. It was appropriate for the time, because our department was growing and expanding. We merged with the art and history museums on campus to form a super-department: Museums, Archives & Rare Books. This year, though, feels like one of retrenchment. We lost a long-time member of staff at the beginning of the year. The redistribution of her workload among the remaining staff brought fresh eyes and energy to some long-standing issues. We were able to use it as an opportunity to make significant progress on projects that had been stalled. In light of the difficult economy and constant budget cuts, I think similar organizations will find our actions of interest.

Writing it down

At the beginning of the year, we hired a Records Manager, the first in the history of the university. She’s been working to understand and document the workflow of records creation and disposition across departments in the university. We don’t have an enterprise document management solution, so it’s been quite an undertaking. As part of her duties, the Records Manager has also inventoried records at our off-site storage vendor, identifying and transferring materials with historical significance to the Archives. Although the mission of the Archives is to collect and maintain university records that document its activities and history, we found that we had no recognized authority to transfer records without the consent of the department or division head. Trying to find someone who was willing to accept responsibility or to grant permission was an exercise in futility. After trying to track down one division head over the summer, it was decided that we needed to seek the authority to transfer records deemed to fall within our collecting policies. The problem was we had few written policies.

The department formed a policy committee with representatives from the museums, Rare Book Gallery, and the Archives to develop unified collection management policy. We found that we were able to use the same language and concepts, adding specific examples or language for situations unique to each unit. After several iterations, the committee was able to create a collections management policy over the summer. It’s currently awaiting final approval by the Chief Information Officer before being implemented. Once this is in place, we are ready to submit the transfer authorization proposal to the President’s Committee for approval. The completion of the collection management policy spurred interest and development in additional policies and procedures, including reproduction, access and use, and registration, as well as related forms. We’re currently working on creating copyright policies. My particular focus is developing guidelines to help users to understand copyright restrictions and to make responsible reproduction decisions.

Clearing it out

Looking at ongoing problems with storage space, both physical and digital, we created an ad hoc committee to review materials and make decisions regarding disposition. One of the first problems that we identified was a large amount of supplies and resources that had been amassed “just in case.” These included outdated or broken equipment, unnecessary or unusable supplies, and donations that did not meet our collecting areas or interests. The process of clearing out the space allowed us to reorganize supplies, to order new equipment, and to relish the sense of accomplishment. We used this momentum to tackle the shared drives and digital repository, both of which had become dumping grounds. The same committee developed policies to govern the shared drive, as well as file naming conventions. Using these new documents, we began a clean-up of the shared drive, which amounted to removing approximately 60 MB of duplicate or unnecessary files. The process turned out to be so easy that the committee offered the services to one of the museums. We were also able to incorporate elements into an outreach session on email best practices and plan to offer the service to other departments on campus.

Building it up

The university is coming up on its 50th anniversary in 2013 and the Archives was approached to digitize historic images and to make them available for users. Currently, we rely on Archon to provide public access to our records and small files, such as oral history transcripts and low-resolution images. It was decided that it is inadequate to provide access to the high-resolution images required for the anniversary. I was tasked with comparing systems and making a recommendation. After much research, I determined that DSpace would best meet our requirements. We’re currently working with campus IT to implement a DSpace instance. As part of the DSpace project, I mapped the workflows of the Archives and identified current and future technology needs. This plan can now be used to ensure that we make strategic decisions based on demonstrated needs.

In addition to implementing new systems, we’re also focused on improving our current products and services. Archon was originally populated by importing data from our old CMS. It contains many records with minimum information. As part of the general commitment to bring consistency to our records, I’ve initiated a project to enhance the catalog on a record-by-record basis. This also allows me to check new accessions and add them to existing collections when appropriate, as well as to verify location and beef up the MARC record in the library’s OPAC. The project has already revealed some MARC mistakes and location errors.

By mapping the Archives’ core functions and relating them to technology needs, we are able to offer products and services of higher quality and with greater efficiency. We were also able to use our clean-up as a template for new services. While retrenchment may not seem as exciting as rapid expansion, it can still be an opportunity for growth and improvement. Please feel free to contact me if you'd like to ask any questions or follow up. You can reach me at agraha31 (at) kennesaw (dot) edu.

Many thanks to Gretchen for providing the opportunity and forum!

More Blogiverse

Here are some good reads that weren't necessarily created for Day of Digital Archives, but are related just the same:

Get Your Bits Off (Old Storage Media) at The Signal
The Future of Libraries in a Digital Culture at the Huffington Post
Bit by Bit: Software Collecting at the Computer History Museum blog

I'll be heading home soon, but our west coast and other international friends can keep the conversation going!

In my forty years (1964-2004) teaching art history at Carleton College I took thousands of slides of architecture for use in my classes. At the time I never thought they would have a life beyond their physical existence as filmstrips in a plastic mount. Then the Society of Architectural Historians established SAHARA, a digital image archive. I contributed 4869 images. Since then, it has been gratifying when other scholars have mentioned seeing or using some of those images. One episode stands out.

SAHARA forwarded to me a request from an American (I believe) scholar in Beijing asking for permission to reproduce in a Chinese-language journal a slide I had taken of a detail of the Allen Art Museum in Oberlin, Ohio. Apparently, I had taken the slide at just the right angle to support her argument (for architectural historians, it is Robert Venturi's "ironic ionic" column). It turned out that the SAHARA image was not the right size but the Carleton slide curator, Heidi Eyestone, was able to adjust from the original slide.

Perhaps this sort of thing is an everyday occurrence to most scholars now. But to this professor, still living in an analog world, it was just amazing that an image taken in Ohio by a professor from Minnesota could, 30 years later, come to the attention of a scholar in China through accessing a digital archive, and that image, corrected in Minnesota, could be transmitted to Beijing and eventually end up in a Chinese-language journal published in China. LS

Across the Blogiverse, part two

There are lots of fantastic posts going up all over the place to celebrate Day of Digital Archives. If you haven't yet, be sure to check out

Safe and Sound: Tips for Preserving Digital Audio Files by Meg Tuomala at Bears Repeating (the blog of the Washington University Archives)
Day of Digital Archives 2012 by Heather Gordon at AuthentiCity, The City of Vancouver Archives Blog
Not a typical week by Simon Wilson at Born Digital Archives
Challenges to a new Digital Archivist by Krystal Thomas at Illuminations, the blog of the Florida State University Libraries Special Collections and Archives
Day of Digital Archives: Finding Balance by Kyle Matheny at The University of Alabama Libraries Digital Services blog
We Descended: Processing the Bill Bly Collection with the UMD Born-Digital Working Group bythe Hornbake Library at the University of Maryland
Day of Digital Archives: our experience so far by Sarah Romkey at the University of British Columbia's UBC Library blog

Happy Reading Everyone!

A Day in the (Digital) Life

Last year I chose to use my DoDA post to broadcast my former institution’s ideas about communicating what digital archives are to the public. This year, I have a new job at Penn State as Digital Records Archivist. This job didn’t exist before, and we’re still trying to figure out what it’s going to be. But we have a lot of irons in the fire here, so I’ll stick to the theme of talking about what my ‘day of digital archives’ looks like.

I think it surprises some people to hear that I find my average day to be distinctly non-technical, but it shouldn’t. And I’m okay with that. I’ve found that what interests me most about the work is the challenge of figuring out how it all fits together, how it blends with the other work of archives, and how the overall work of the institution can be reimagined and modernized.

A lot of what I am doing at my current institution is capacity building: forming policy, trying to locate and implement best practices (which can and should be institution-specific), researching and reading work being done by others, experimenting with free tools, and just dialoguing with other staff about challenges and issues. A significant and persistent challenge is trying to align a developing electronic/born-digital records practice with our institution’s established practices, which includes trying to both mold digital practice to legacy practice and recommending ways in which legacy practice might be modified to suit new digital realities.

For example, this morning I have spent a little time working on my long-fermenting ingest workflow for electronic records. This workflow needs to address not only the particulars of what we might call ‘digital processing’—how we transfer material from some kind of external digital media to network ‘dark archive’ storage, how we store that transfer (discreet files or disk images or both), what metadata/manifest information we attempt to extract, and how we document this activity—but also how the media flows to the digital archivist in the first place, and what becomes of it afterward. We have to setup policies and practices that govern the separation of media from collections, how the media and the work we do with it is documented in Archivists Toolkit (as well as how it is documented there after ingest), and ultimately how it is incorporated into arrangement and description activities. Perhaps the biggest challenge I face right now is figuring out just how to provide access to the material should a researcher stumble across some record of it in the library catalog or finding aid platform. Actually, this thing is pretty much done, but I keep tinkering.

A lot of what I am doing is just traditional archival work cast in a different light. The next task on my list this morning is, well, appraisal. Penn State recently contracted the services of Archive-IT, and we’ll soon be using their crawling services to strategically capture university websites. Despite being published and disseminated through web technologies and platforms, Penn State University websites are subject to the same considerations as other university records. They exist within record groups and are potentially subject to retention schedules.

In preparation for this project, we secured a list of sub-domains on PSU.edu from central IT, and have been visiting the websites on this list to determine an originating department (provenance), look for sites from departments that fit the collecting priorities of the university archivist, try to determine how frequently the site updates (an ongoing process), and record some descriptive information about the sites in advance. Our initial collecting priorities will focus on sites related to the administrative units, colleges, and commonwealth campuses, but future phases of collecting will seek broader documentation of university work, culture, and life. As Mike Shallcross stated in his excellent case study on archiving the University of Michigan’s websites:

While reviewing Michigan‘s online resources, archivists were keenly aware of the extent to which websites help confer credentials (from the recruitment of students through their graduation), convey knowledge, foster socialization, conduct research, sustain the institution, provide public services, and promote a distinctive culture.

Increasingly, the kind of content Mike refers to above is being delivered through multimedia (primarily video), social websites, and cloud-based services (universities are increasingly using YouTube, Flickr, etc. to host content). Future appraisal and planning won’t necessarily be more complex; we’ll just have a larger landscape of material to examine. It won’t necessarily require special training or technical skills; it just requires an awareness of institutional uses of technology and methods for delivering content that should be collected by the repository.

Finally, I’ll take a little time later today to start preparing for a talk I’ll be giving at this year’s Digital Library Federation Forum in Denver. My position as Digital Records Archivist was written into a Mellon personal scholarly archiving grant that was awarded before I started at Penn State this past May. The project is “an ethnographic study of faculty behaviors and articulated needs central to robust scholarly creation and successful navigation of the personal archiving and information management process.” From an archival point of view, the study should provide some insight into the personal digital habits of faculty and the technologies used to support and share their research. We’ll be collecting data through surveys, interviews, an on-site observation, and hopefully, for my part, I’ll be able to identify patterns that can help inform acquisition and management approaches to born-digital material (see the post earlier today, "What's in a File Name?"). This has been an interesting collaboration between various information professionals—I will be presenting with the lead investigator, an Educational and Behavioral Services Librarian, as well as an ethnographic researcher—and I think it speaks to the ways in which archival work in the digital age is inevitably going to be cross-disciplinary.

And by the way, thank you, Gretchen, for putting this together again!

-- Ben Goldman, Penn State University