Friday, October 7, 2011

24 Hours: The Day of Digital Archives

I'm Mark Matienzo. I work as a Digital Archivist in Manuscripts and Archives at the Yale University Library and as the Technical Architect for the ArchivesSpace project. I worked with Gretchen on the AIMS project. I wrote about my Day of Digital Archives on my blog,, by describing a fairly busy 24 hours I had on Thursday, October 6.

Analog-to-digital video preservation and a site visit to a repair facility at the NASA Ames Research Center

Hello everyone, happy day of digital archives! I'm Lauren Sorensen, Preservation Specialist working at Bay Area Video Coalition, a technology non-profit devoted to inspiring social change by enabling the sharing of diverse stories through art, education and technology. Our preservation department specifically works to preserve and provide support to archives with film, video, and moving image and audio material in their collections. We are one of the only non-profit vendors for high-quality preservation of video and audio in the country.

My blog entry can be found on our home site blog here, and involves a site visit from earlier this year, when I met in person for the first time Ken Zin, who works on the NASA Ames Research Center campus (at the site of a former MacDonald's there!) repairing obsolete reel-to-reel videotape machines. His work is essential to us because he is one of the only experts left in the country doing this specialized type of work. Because these decks feature heads (the part of the machine that reads the magnetic waveforms on the tape) that are proprietary to the companies that made them in the 1970s and 1980s, it is a real challenge keeping them in proper working order for high quality preservation. Realignment and regular maintenance is important in maintaining a facility that is appropriate for preservation services; we maintain these decks as we would museum artifacts because they are some of the last working machinery that is available to transfer 1/2" open-reel machines; after these decks are no longer operational, any magnetic recordings held in archival collections will be lost.

Please enjoy the photos!

Thursday, October 6, 2011

Souding Meaning in Archives

I've posted my Day of Digital Archives over at my own blog.

Day of Digital Archives

My Day of Digital Archives is not going at all as planned. I meant to be working on web archiving and training my students in processing the e-records of our last University President. Instead, I'm sitting at home typing one-handed and nursing a broken elbow (on my dominant arm, of course) sustained in a cycling crash yesterday on my way to work. So this post will be pretty short, and if anything catches your interest feel free to email me at khomo [at] uoregon [dot] edu for more information.

First, a little background about me: I'm the Electronic Records Archivist at the University of Oregon. I'm responsible for the ingest and preservation of e-records in Special Collections and University Archives. My current most pressing projects:
  • Processing raw video files of oral histories documenting the Latino experience in Oregon. I had a plan, and then Final Cut X threw a wrench in the works: the raw video footage created in Final Cut 7 can't be read with Final Cut X. This is a textbook example of why proprietary digital formats are not a preservation solution.
  • The aforementioned presidential e-records. Organic file trees are a nightmare, and trying to sort out the records designated as non-permanent by state law only adds to the confusion.
  • Creating an inventory of all our audio-visual material in order to prioritize both for digitization and for traditional preservation/conservation.
  • Doing some pre-emptive work with a couple of faculty members who are teaching oral history classes this academic year and who want their students to deposit the audio files in the archives. I'm happy to take the content, but if I work with the professors ahead of time and get myself scheduled to do some basic instruction in the class I'm far more likely to get usable files at the end of it.
  • Archiving University web content, both from the official UO pages and from other sources (social media, news outlets, etc.)
  • Working with our data management team to create ingest and cataloguing workflows for acquired digital data.
I think that's it. Now excuse me while I take some pain medication.

Missing first post-AIMS born-digital workshop (SULAIR)

Several staff from Special Collections and elsewhere are attending Peter Chan's first (of four) workshop on working with born-digital materials. I think I am the only one who couldn't attend today!
This is a prelude to several projects that will be concurrently processing analog and born-digital materials and part of our build out now that the AIMS project is wrapping up. One of these projects is near and dear - belonging to the Manuscripts Division. It is part of our current NHPRC grant-funded project re the records (analog & digital) of the Stop Aids Project, a still-active organization based in San Francisco.

Digital archives are having their day!

WOW. This DoDA thing has really taken off. Tons of terrific posts, people, projects, progress. Zipping through everything that has come in so far today, I've felt like a bit of a kid in a candy store (nerd alert! but I guess I'm in good company today). Unlike everybody else who is weighing in, however, I have no experience whatsoever managing or preserving any digital archival stuff. (That's assuming you don't count my stuff at home that's backed up on Icloud. Nope, I didn't think you would want to count that.)

Regardless, as an archivist, I've been obsessed with the born-digital deluge for a couple of decades--out of necessity rather than ability or innate interest. When I was a University Archivist, I had the embarrassing stack of physical media sitting on my desk waiting for the day when we would "have time." At least we were clever enough to check off "digital media" on our accession forms, so when survey day comes, it'll all be easy to find.

In my current role in OCLC Research, where we undertake projects that will press research libraries forward in addressing today's Big Challenges, we work actively with members of our OCLC Research Libraries Partnership to identify appropriate topics and do good work that will help. A year ago we published Taking Our Pulse, the report of a survey of special collections and archives in more than 150 research libraries (universities, colleges, museums ...) across the U.S. and Canada, the data from which demonstrated how little is going on in management of born-digital in research libraries. Though 80% of respondents said they have at least some born-digital materials, less than one-third could say how much. Fewer than half have assigned responsibility for addressing the born-digital realm. Publicly-accessible metadata is rare. More than 80% admitted that they need education and training. And--no surprise--born-digital is one of the three most often-named "most challenging issues" that special collections and archives must address.

In the wake of the survey report, my colleague Ricky Erway and I are now in the midst of what we like to call our "born-digital baby steps" project, the objective of which is to help research libraries that have yet to take any action get off the dime. In addition to outlining the briefest possible (so as not to intimidate) list of basic (really basic) steps that should be taken to get a handle on what you have (survey!) and bring materials under initial control (server space! bit imaging! simple accessioning!), we want to demonstrate that getting started truly is feasible, including for those who don't have special resources (or staff) and for geezer archivists like me who have spent years being frozen in place like deer in the headlights.

In our research reports we're inclined to consider library directors a core audience, since they ultimately set the priorities and control the purse strings. Some of those directors are counting on their archivists to deal with born-digital (particularly for content that's beyond the scope of an institutional repository), but they don't know what that means, or why it's so hard, or the extent to which their IT people are going to be key players. At the other end of the spectrum, however, some directors think that born-digital has nothing whatsoever to do with special collections and archives (um, that's just the "rare" and the "artifacts," isn't it?).

So in the baby-steps report we're also going to talk about how "born-digital" fits into research libraries in general, and how it intersects with special collections and archives. We'll outline the wealth of skills and expertise that archivists have to contribute to the mix. We'll discuss how the range of born-digital content and media dovetails with the types of material that special collections and archives have traditionally collected. Remember, library directors and other higher administrators are a key audience: even though many archivists already know these things, few directors know how to think about them. So, we hope we'll be able to help them get a grip so that they'll understand what their archivists are up against.

We have a small army of colleagues in the wings (including some of you), all of whom have various sorts of expertise in the born-digital realm, standing by to give us feedback on drafts, answer our baby-steps questions, and in general keep us from embarrassing ourselves by saying things that aren't quite true. And all these DoDA postings are suggesting some other people with whom we might want to talk at some point.

Speaking of which, if you have any reactions to our project, please comment! Brickbats, tomatoes, garlands of flowers are all welcome. Or get in touch with Ricky or me to chat.

Congratulations to everybody who has weighed in today, as well as those listening in, for being part of the community of born-digital archives and archivists. Hey, Gretchen, you better be planning to preserve this blog! It's going to be a fantastic resource.

Day in the Life: Digital Archives Educator

On the Day of Digital Archives, I am

1) Working with local archives to develop projects for my Digital Archiving and Preservation class in the spring semester;

2) Planning with my TA a project to set up in our “Digital Archaeology Lab”several canonical systems for testing significant properties and how original bitstreams pulled from old media might interact with a coeval computing environment—to include research in historic popular computing publications of the relevant available programs, peripherals, and other sundries and drawing on a project last year researching the preservation of Ultima II using an Apple IIc from the Goodwill Computer Museum;

3) Discussing with project partners at the Museum several additional research projects going forward in the form of student capstones and individual studies;

4) Considering progress on my own research about archival organization of preserved born-digital materials;

5) Planning advising sessions for spring 2012 to assist students especially with getting experience in hands-on work with born-digital archival material—hardware, media, and content.

Here is the virtual archives:

Here is the physical lab (or parts of it):

Organizing canonical environments



I don't have a sleep schedule - I have periods of blackouts in between work sprints. Today's Day of Digital Archives happened for me after one of these blackouts, where I was up around midnight and looking to get a bunch of stuff online, and prepare the rest of it for being online. Online is where a lot of my stuff goes because the first and main priority is to share what I acquire, not sit it somewhere waiting for grant funding or hugs to make it better.

Here's part of 200 hours of Game Developers Conference footage I'm digitizing for their GDC Vault site - they're having me put everything that makes sense up, for free, for the world to see. These are conference recordings dating from 1996 up through to 2008, and there's a lot more besides this. Historically fascinating, one of the leading industries in the world has some of the brightest minds at this conference, not to mention introductions of major platforms like the Dreamcast, XBOX, Playstation (2 and 3), and so on. Most of this has been accomplished using this old thing:

..and this is just for the BetacamXP format tapes - I have HDCAM, MiniDV, VHS, and Audio tapes as well. All are being digitized, all are going up. A lot of work!

I blew a bunch of material up to the Archive Team collection at; mostly captures of websites that decided to go down that had been around a long time, like the social gaming site or the art project Word Count Journal and even some pieces of the big boys, like Google's now defunct Google Friends Newsletter they decided to deep-six this year. I'm now doing some checks on the Archiveteam Friendster Snapshot, where we put millions of Friendster accounts up for later study by academics and historians. We didn't get all of it, not by a long shot, but we definitely got a good sample for people to use.

A few years back, I did a documentary on text adventures called GET LAMP, and over time have been uploading raw footage sets from that movie into this collection. Well, among the things I shot was a ton of footage inside Mammoth Cave, Kentucky, and as we speak, 11 gigabytes of .m2t (hi-definition) footage is making its way up. That'll probably be all day, at least.

Just yesterday, my talk about Shareware came up! A massive, profanity-filled rant of fun, that talk is here.

Heck, if we want to mention talks, this is the way to go: Archive Team: A Distributed Preservation of Service Attack.

Until my next blackout!
Written by Steven Szegedi, Archivist and Special Collections Librarian, Dominican University

This post was originally published on the Dominican University Library Blog.

This year's Society of American Archivists' 75th anniversary annual gathering, held in our very own city of Chicago, was positioned as a reflection upon the development of the both society as well as of the profession as a whole. Naturally this included a fair number of historically minded sessions reflecting on accomplishments and challenges met. Ultimately, however, a very welcome focus on digital archives emerged over the three days of sessions as archivists, historians, programmers, and records managers addressed collective present and future challenges facing our institutions.

The panels and sessions were incredibly energizing, proving time and again that archivists of all stripes are willing and able to shoulder the responsibility for our digital cultural heritage. Archivists of old could content themselves with managing documents already consigned to history (excepting of course those working within the registratur system…there's always an exception); no so with today's archivists who are active in content creation, and with managing digital collections from the point of creation onward.

The quality of the open source tools that are emerging for all kinds of repositories is awe-inspiring: ArchivesSpace (which will merge the Archivists' Toolkit and Archon), Archivematica, Denver University's Records Authority, and Tuft's Taper project, to name only a few. The only dispiriting refrain common to all of these software systems is their proposed launch dates of 2013 and beyond…promises of a brighter future unavailable today.

As a lone arranger, for me what was most useful was hearing real-life reports about the practicality of existing tools for harvesting and managing digital content for our repositories: HTTrack, TreeSize Pro, Heretrix, Firefly SSN finder, TeraCopy… Hopefully by next year I will be able to report on our successes and learning curves with some of these tools at Dominican University.

Personally, I had to create a new tab on my Netvibes dashboard for all of the presenter's blogs that are now my required daily readings. Apart from my hard-won Deranger ribbon, I am left with an abiding joy from meeting so many inspiring and engaging archivists unafraid of the luminous glow enveloping our digital horizon.

Elsewhere in the Blogosphere IV: The Next Bloggeration

The ├╝ber-cool Rhizome has posted today about digital art and anti-aliasing and Columbia Center for Oral History blogs about their digital preservation initiatives. Enjoy!

McLuhan's Understanding Media: "The [digital] medium is [no longer] the [only] message."

Happy Day of Digital Archives
David Kay, nydawg [New York Digital Archivists Working Group]
October 6, 2011

McLuhan: “The Medium Is the Message?” or “The [digital] medium is [no longer] the [only] message.”

This year marks the 100th anniversary of the birth of “the new spokesman of the electronic age”, Marshall (Understanding Media) McLuhan, and digital archivists should take a moment to think about how media, digital and analog, hot and cool, and in many different formats change our jobs, lives and responsibilities. With threats of technological obsolescence, vendor lock-in, hardware failure, bit rot and link rot, non-backwards compatible software, and format and media obsolescence, digital archivists need a system to accurately describe digital objects and assets in their form and function, content, subject, object and context. If we miss key details, we run the risk of restricting access in the future because, for example, data may not be migrated or media refreshed as needed. By studying and understanding media, digital archivists can propose a realistic and trustworthy digital strategy and implement better and best practices to guarantee more efficiency from capture (and digitization or ingest) and appraisal (selection and description), to preservation (storage) and access (distribution).

Over the last ten, forty, one hundred and twenty thousand years, we have crossed many thresholds and lived through many profound media changes-- from oral culture to hieroglyphic communications to the alphabet and the written word, and from scrolls to books, and most recently transiting from the Atomic Age (age of atoms) to the Information Age (era of bits). While all changes were not paradigm shifts, many helped shift currencies of trust and convenience to establish new brand loyalties built on threats of imminent obsolescence and vendor lock-in. As digital archivists, we stand at the line separating data from digital assets, so we need to ensure that we are archiving and preserving the assets and describing the content, technical and contextual metadata as needed.

Want to read the rest? Check out the New York Digital Archivists Working Group [#nydawg] wordpress blog. Thanks for your interest in #DigitalArchivesDay #DoDA.

Adventures in web archiving

As a web curator, quality control of archived websites is part of my everyday work.  With the help of our able student assistants, we review each archived site to determine whether or not we have successfully captured the site.  Basic quality control includes making sure that the archive site resembles the live site, internal site links work, we can download files (for example, PDfs of reports), and so on. More advanced quality control involves evaluating crawl reports and adjusting (and re-adjusting) the settings for a particular site.  The advanced quality control happens when a site is not successfully captured using our default settings.  Most of the time, crawl issues can be tackled and resolved by the combined brilliance of the curators, library programmers, and the unflappable Archive-It support staff.  But occasionally, there are sites that seem designed to test your skills and your patience.  Just when you think you’ve solved the problem and the road to a perfect capture seems clear, the site is redesigned, or changes URLs, or disappears only to reappear months later with a new host of issues.
Web Curator Nightmare: an incredibly valuable site disappears before it can be captured, then reappears months later at a completely unrelated URL, redesigned completely in Flash and Java and a programming language that didn’t exist five minutes ago, with content hosted on eight different servers inexplicably returning HTTP 404 codes all day except for three hours during the vernal equinox.
The Trials of Web Archiving: A Saga in Six Screenshots...
Victory! Almost!
Formatting is *slightly* off.

This screen appears when something was not captured and/or is missing from the web archive

 More defeat.
We originally captured the Arabic content, and then the URL changed...
 Crushing defeat.
This shows the URL of a PDF that was not captured.
 Victory at last! Again!
The new website, captured and functional.
There should be content here...

Elsewhere in the Blogosphere, pt III: the Bloggening

Check out the Library of Congress and NDIIP's digital preservation blog The Signal for a post on LC's many digital preservation resources!

Digital Archives at IIT

Having only recently launched an institutional repository at the Illinois Institute of Technology (in fact, our first major digital archives initiative), I used this Day of Digital Archives as an opportunity to introduce the IR and some related concepts to the IIT community via our library blog:

Dana Lamparello
Metadata & Digitization Librarian
Illinois Institute of Technology

Sharing Our University Mission

On September 27, Dominican University held its 2nd annual Caritas Veritas Symposium. This year's theme, "From Motto to Mission," explored how truth (veritas) and love (caritas) lead to action for “the creation of a more just and humane world.” (Dominican’s motto is Caritas et Veritas and our mission is to prepare students to seek truth, to give compassionate service and to participate in the creation of a more just and humane world.) Presentations included formal papers, panels, round-table discussions, and creative dance, and involved faculty, staff, students, alumnae/i and trustees. Our library offered to publish the proceedings in our new open access repository, Constellation. Last year, select proceedings were published in print and distributed mostly to the immediate Dominican community. This, year we wanted to share the proceedings with the world. After all, sharing our research and service projects with the world is itself an act of caritas that allows others to freely pursue the truth and to use that information to make the world more just. We’re sharing, for example, a theater professor’s work to end the death penalty in Illinois, a theology class’s trip to Native American reservations to teach children about climate change, and a Dominican Sister’s research into service learning and service work in the early years of the University. This unique collection of projects will now be preserved, accessible to the Dominican community and to anyone outside the University. We’re publishing the papers, slides and other materials as they are sent to us by the authors. Find them here.

An English Professor Among the Archivists

My job is actually pretty simple. When people ask me what I do, I tell them I’m an English professor who teaches archivists about computers. What’s not to understand?

Okay, there’s more to it than that. The official story is that I’m an Associate Professor of English who also has a research and administrative role at a digital humanities center, the Maryland Institute for Technology in the Humanities at the University of Maryland. But that’s arguably even less illuminating.

Let me try a third tack. Here’s a picture of my office. Apologies for the mess, I’m actually kind of a neat freak, but this is an interesting mess. Maybe one way to talk about what I do is just to walk you through it.

So, we’ll start over on the left-hand side. That’s an Apple IIe, my first machine. My parents bought it for me (yeah, I was that lucky) back around 1982-3. It still boots today. Among other things, I keep it around to show students. Most of them have never used a computer that didn’t have a hard drive, so first having to select a disk and pop it into the drive before the computer does anything is a new experience for them. It’s signed by two people I have enormous respect for, Bruce Sterling (famous for, among much else, the Dead Media project) and Jason Scott, aka @textfiles. Both of them have been here to give talks, but the signatures also reinforce the way individual pieces of hardware become artifacts in their own right, something we’ve attempted to document in a more formal way. This computer, along with some Commodore 64s that Doug Reside used to keep around MITH when he was here, became the impetus for other people to begin offering us machines to add to our vintage collection.

Which brings us to the next machine over, an Osborne “portable,” which came into the shop in exactly that fashion. A friend--

Okay, not kidding, I was just this moment interrupted by a staffer from university relations who wanted to record an interview about Steve Jobs. We chatted about my early experiences using the Apple technology and he recorded video and audio of the old Disk II unit spinning up.

Anyway, the Osborne came to us courtesy of a friend in another library department who knew of MITH’s interest in vintage machines. As we’ve gotten more and more offers like this we’ve realized the importance of developing something like a collections policy, so that we’re not taking in lots of equipment we don’t really have any use for. The Osborne, though, is a real prize, and it too starts right up, though I was initially stymied by the fact that it wanted to boot from its B drive and I had to do a little online research to change the configuration settings. One thing I always impress upon my students when I co-teach (with Duke’s Naomi Nelson) the Born-Digital course at Rare Book School is the extent of the knowledge base that exists online in the computer enthusiast community.

Directly above the Osborne is a foam-board reproduction of a newspaper clipping. This in fact is the story and photograph that inspired Michael Joyce’s Afternoon, widely regarded as the first full-length piece of hypertext fiction (it was first released back in 1987). The original is in the collections at the Harry Ransom Center, where Joyce’s literary papers are on deposit. The “papers,” however, also consist of born-digital materials, including several laptops and several hundred diskettes. This image is a reminder of the complex relationships that emerge around hybrid digital/analog collections, something I wrote about extensively in my first book, Mechanisms.

The small wooden box in the background next to the Osborne? That’s an omnibus edition of Oregon Trail, one of the games in our case set as part of the Preserving Virtual Worlds II project. This is a multi-institutional effort funded by the IMLS that includes the University of Illinois, Stanford, RIT, and MITH at Maryland; we’re attempting to establishes a methodology for evaluating the “significant properties” of games and complex virtual objects, something we expect to become increasingly important to collecting institutions.

Now we get to some really interesting stuff. The old Sun tower that you see is there as a power supply for the 5 ¼” floppy disk drive sitting on top of it. This drive is what I use to create “images” of data stored on old magnetic media. As Jason Scott has powerfully stated, it may well already be too late for much of this generation of computer history, as the floppies are already well past their expected lifespan. But using a floppy disk controller like the FC5025 or the Software Preservation Society’s KryoFlux (which just arrived the other day), I’m able to tether the floppy drive to my current laptop and image the disk as a bitstream representation of the original data. The disk image can then be used as the basis for extracting individual files, or it can be loaded into an emulator. Right now I’m working my way through my stockpile of personal diskettes from the Apple IIe. We’ve also used the FC5025 to image materials for the Preserving Virtual Worlds work. What’s interesting about the floppy controller technology is that it represents a grassroots effort outside of formal academic or collecting institutions, something my colleague Kari Kraus has written about compellingly.

By the way, the blue box in the background behind the Sun tower is a Maxell Optical disk cartridge, a gift from Jason Scott. Visually it resembles a greatly enlarged 3.5” disk, and at Rare Book School I carry it on the first morning so our students will be able to tell our class from, say, the descriptive bibliography crew.

Which is one way of bringing me back to my vocation as a professor of English. Having been privileged enough to grow up with a piece of technology like the IIe, I’ve never really recognized the “two cultures” divide that supposedly separates the humanities from technology. I’d spend an afternoon reading a novel and then that evening hacking my way through an Infocom text adventure—or maybe even making a ham-handed attempt to program one of my own. My interest in digital preservation, however, comes from the realization, largely a product of the intellectual culture at the University of Virginia where I did my graduate work, that computers, like old books, are rich and multi-faceted objects, with their own unique stories to tell. Digital forensics thus becomes the analog to a vocation like descriptive bibliography, one that focuses obsessively on the material characteristics of its object of study. Already it’s obvious that anyone interested in a writer or other public figure from the 1980s forward will likely find themselves working with born-digital material of one sort or another as part of that individual’s cultural legacy. I see my work as a kind of descriptive bibliography for the 21st century, bringing the kind of sensibilities we’ve cultivated with regard to books and other printed matter to bear on the objects and artifacts of our digital cultural inheritance.

There. Does that make sense now?

--Matthew Kirschenbaum