Friday, October 12, 2012

The StoryCorps Archive: A Brief Introduction

We're happy to take part in the Day of Digital Archives! For more information about StoryCorps, please visit

Many people know StoryCorps through listening to our broadcast pieces: brief but hard-hitting two or three-minute nuggets that tell compelling stories. They may be tear-jerking or funny; some relate the story of an overlooked historical moment and some simply convey interesting anecdotes or depict unique characters. However, not all of our listeners realize that each of these clips is edited from a 40-minute long interview (which typically takes the form of a conversation between two people who know each other, like a pair of friends or a father and his daughter), or that, about nine years since StoryCorps’ founding, we have collected some 45,000 interviews—that’s 30,000 hours of tape, all in CD quality WAV format (44.1k, 16b). Transcribing all this audio would require a daunting amount of resources, and, consequently, the vast majority of our interviews lack transcripts.

Despite this issue, our cataloging practices do provide multiple points of accessincluding keywords. We partner with the American Folklife Center at the Library of Congress, which guarantees the long-term preservation of our collection and its associated data. With their help and using a draft form of the Ethnographic Thesaurus, we have come up with a customized list of terms that we use to tag interviews. These keywords allow for the possibility of non-linear, subject-based searches within the Archive, a process that creates connections between interviews with seemingly little in common.

Since these interviews represent a vernacular history, a record of events ranging from world wars to family holidays that is uniquely rich in affective detail, it seems important to enable their ongoing accessibility to researchers and to the public. But what, exactly, do we do with this massive amount of digital audio? We don’t yet have a reliable audio search tool that we can use to delve into each file’s contents.

To address this overarching question, we’ve created a few tools for internal users to make the content more digestible. The facilitators who are present during the recording of each interview take handwritten log notes detailing each interview’s content, which we scan and include in each individual record. As they create database entries for interviews, the facilitators also transcribe five points from their log notes, which users can search in our database system (a customized Drupal-based database). Users can also scan through the interview’s full audio file using these time-coded notes.

Our content searching has enabled us to expand access to previously unheard moments in the StoryCorps Archive. As part of a collaboration with radio producer Krissy Clark, we combed the collection for interviews that mentioned specific locations within the diverse neighborhoods of Lower Manhattan. Krissy edited full interviews into over thirty short excerpts and created a geotagged sound ramble through downtown that we presented to attendees of the New Museum’s Festival of Ideas in May 2011.

We’ve also been able to establish two partnerships with linguistics researchers—one team at MIT and one at Oregon Health Sciences University— that represent a model of collaboration that we hope to pursue further. Our partners at MIT’s Lincoln Laboratory study African-American Vernacular English. As part of their project, researchers took a representative sample of StoryCorps interviews and subjected them to computer analysis, specifically focusing on speech and dialect patterns. As a result of their research, they generated transcripts that we were able to add to our own records for future researchers to use.

Recently, we hosted an advisory summit supported by the Alfred P. Sloan Foundation entitled “Reimagining the Archive.” Data scientists, statisticians, archivists and librarians, oral historians, and linguists came to StoryCorps’ offices for a day of brainstorming the possibilities inherent in a large collection of digital audio and its associated metadata. Panelists encouraged us to sponsor a “hack day” that would allow innovative techies to forge new paths through our data, to build out widgets so partner organizations could “curate” their own subsets of the StoryCorps Archive, and even suggested that we create a mirrored server where large institutions could run processes on the entire archive to analyze speech patterns, word choices, or metadata. We look forward to piloting some of these ideas and to working with new partners in 2013 – hopefully we’ll be able to show some results on next year’s Day of Digital Archives blog!


  1. This are great ideas and information.. It do helps me a lot.

    File archiving

  2. We look ahead to flying some of these concepts and to dealing with new associates in 2013 – hopefully we’ll be able to demonstrate some outcomes on next season's Day of Electronic Records blog!

    D3 Gold
    RS 2007 Gold
    WOW Gold Kaufen Billig