Thursday, October 6, 2011

Gilderoy Lockhart's Guide to Archiving the Sugar Quill

I have been conscious of digital archives for a long time, even before I began my position as the Manager of Digital Collections at the University of Maryland Libraries. My concern and angst has been focused more on the born-digital materials that I have created over the years in my personal life, and the one that perhaps means the most to me is The Sugar Quill <http://www.sugarquill.net>.

The Sugar Quill: A History

In 2000, a close friend and I, both avid readers, simultaneously discovered the world of online fan websites and, specifically, fan fiction. We met on a website called "The Republic of Pemberley" <http://www.pemberley.com> as fans of each other's stories, both imagining alternate-universe happenings to the novel Pride and Prejudice. We also both shared a fascination with a new book series focused on a boy wizard named Harry Potter, and began to search for something like The Republic of Pemberley for the Harry Potter novels. It is difficult to believe now, but in 2001, the Harry Potter offerings on the Internet were slim, especially websites geared towards more civilized, intelligent conversation. Over the course of a bleak autumn in 2000, we conceived and designed the Sugar Quill. It was an opportunity for me to brush up on my HTML skills, and to examine some emerging technologies. We launched the Sugar Quill on January 5, 2001, with a garish gold background created by me after-hours with Paintshop Pro, the aim of which was to appear magical and Gryffindor-like. The website would have two major components: fan fiction and discussion forums. The fan fiction would be peer-reviewed - we would accept only a limited number of stories, and every author would have to submit to our "beta-reading" process, by which site volunteers would work with authors to correct grammar and spelling, and to counsel on matters of plot and character development. All stories also had to fall within what we considered the "canon" of Harry Potter, and for those that took place after the most recently released book in the series, Harry Potter and Goblet of Fire, the stories had to be what we considered "canon-compliant." We chose a system called EZBoard to host our discussion forums. For a small fee, we could have gold status on EZBoard, which hosted all of the forums, and allowed for unlimited discussion, different user groups, and moderation activities.

If you build it, they will come. We did not expect a lot of traffic. We told a handful of friends about the website and used it as our own personal playground. The initial costs were low - the EZBoard fee and a Yahoo hosting fee of around $10/month. The time investment was large, but it was a hobby, and for hobbies, time is not a concern. Soon, however, word spread. We made a few vague announcements on different forums, such as the then-dominant "Harry Potter for GrownUps" and soon passers-by started joining into the discussions. Within a year, we had so many fan fiction submissions that we had to close the open submission process and instead designate official "submission windows" for new authors. We would generally only accept a quarter of the authors who sent us their stories. The discussion forums remained relatively small, never more than 100 or so members active at one time, and the community grew strong, perhaps best evidenced by the reactions and postings on September 11, 2001.

By 2002, the site had grown exponentially. One day, I received an email from a bright fifteen-year-old in England, who had observed that it must be very time-consuming and cumbersome for me to update the Sugar Quill, and that he had developed a basic content management system using MySQL and PHP that he thought could help me manage the fan fiction archive. He also did not like EZBoard, and recommended that we obtain more server space and install an instance of Invision Board, an open source discussion board software, that we could manage ourselves. He was correct that the site was taking a large chunk of my time to manage; I uploaded every story manually, had to update links in various places, and was spending approximately 30 hours/week maintaining everything, even with a team of thirty volunteers. Not only had the British whiz kid programmed a new website for us, but he also had a friend, who, at age eighteen, was running his own webhosting company and who would be able to provide us with more support, server space, and bandwidth than Yahoo.

Early Archival Attempts

So we switched. What does this whole long history have to do with digital archives? I'm getting there. But bear in mind that, with the exception of a few hand-drawn images of quills that my friend and I had created and scanned to use as site logos, the Sugar Quill's entire existence was digital. Early on, we printed a few key planning documents, but most everything resided on my Toshiba laptop, a monster of a machine that had the loudest fan ever heard on a laptop computer and that displayed the blue screen of death at least once a day, on a few backup CDs, and in my own and various friends' email accounts, on a Yahoo server, and on EZBoard's servers. Being an archivist, backups and preservation were on my mind, but I did not have the tools to really manage all of this properly. There was no way, for example, to extract data from the EZBoard. If I wanted to "archive" notable conversations, I had to save the HTML pages, which is precisely the approach I took with the September 11 posts, and this is why these still exist on my computer.

Approximately a year after we switched to the new content management system and new server host, two things happened. First, I awoke one morning to several frantic emails to my personal email account, that the Sugar Quill was "down." I turned on some instant messaging software, and sent a message to my British server host. Apparently, all of the servers he had been using to host the websites were located someplace in Texas and the owner of the servers had absconded in the middle of the night with them. They were gone. Turned off. What could we do? I had asked repeatedly for backups of the MySQL database that now contained most of the Sugar Quill's information. I had saved what I could in my archaic, slow manner, but I did not have backups of any of the actual data. Luckily, my host did, and within a few days, everything was up and running again. The second crisis had to do with EZBoard. When we migrated from EZBoard to InvisionBoard as a platform for our discussion forums, we did not migrate the EZBoard discussions, but rather, linked to them from a link on our new site. EZBoard suffered some sort of huge server attack/crash and made a decision NOT to restore data for a large number of their defunct forums. With the exception of discussion topics that I manually saved, those old EZBoard discussions are now lost to history.

The Sugar Quill peaked sometime around the release of the seventh and final book in the Harry Potter series, Harry Potter and the Deathly Hallows in 2007. By that time, the original core of volunteers had all drifted away. Many of us had become and remain to this day, close in-person friends, but life and interests had moved us away from the daily interactions on the Sugar Quill. Many began writing their own original stories. People married, had children, returned to school. Besides, the series was over. While J. K. Rowling left much to the imagination, many questions were also answered, and it had been the speculation that fueled our obsession. New fan sites were everywhere, many with more advanced searching features than we could offer. My fifteen-year-old whiz kid was in college, and while he helped when he could, he also had moved on. Sites such as LiveJournal made it possible for anyone to post a work of fan fiction easily, without being at the mercy of a dictatorial fan fiction site and its hefty rules. We made a decision to "freeze" the Sugar Quill in time. At the time of the freeze, the Sugar Quill contained 2,634 stories and 1,168 fan art works by 958 authors and artists. These items collectively had close to 133,000 reviews. The discussion forums had 9,329 registered members and 411,406 posts.

How do you "freeze" a website? Around the same time, our British web host announced that he was moving towards focusing on providing high-speed internet services and discontinuing his web hosting. Our bandwidth needs had reduced significantly by 2008, and we determined that a basic Yahoo business web hosting plan would suit us well. He helped us move everything to the new server, and I put a notice on the website that it was in "Read Only" mode. I knew that there were still people who read the stories and the discussions and I did not want to take that pleasure away from them. I decided that I would continue to pay to host the Sugar Quill until it could be adequately archived elsewhere and interest was truly dead. That was three years ago. I am still paying.

Impedimenta! Impediments to Archiving a Website

Wayback Machine
What are the impediments to archiving? The Internet Archive's Wayback Machine has archived parts of the Sugar Quill. The part that most concerns me is the fan fiction archive. However, for any story that contains more than one chapter, the Wayback Machine fails. Why? We use a very basic pull-down menu that requires that the user make a selection to navigate from chapter to chapter. Because of this, the Wayback Machine does not follow the links. The Wayback Machine also has not captured any of the fan art. The reason for this I do not know. The Wayback Machine does not like our background image. We have a robots.txt file that indicates that the forums should not be crawled and I have not had the time to investigate the effect on the site if I remove that restriction. For there are other issues that go beyond simple backups...

PHP
In 2009, Yahoo updated the version of PHP that it was running on its servers. Specifically, two PHP configurations were deemed to be result in insecure code, and Yahoo disabled them. Unfortunately, my genius British whiz kid created the Sugar Quill's PHP code in 2002. I began to receive emails from concerned visitors to the now read-only site that certain pages were no longer accessible. Luckily, I connected with my whiz kid (now aged 52 and living with his seventeen children on a farm in Wales) on Facebook, and he corrected the problem, adding an amusing footnote in the remarks: "/* before you start reading this code, please let me apologise for the world of pain you are about to endure. ;) in my defence, this was my first major project, and it clearly shows. still, what a lesson in legacy code eh?"

Database corruption
Something is amiss with the MySQL database. Every few months, the database-driven portions of the Sugar Quill crash, and attempting to view them provides one with an "IBF forum error." Sometimes it would be weeks before a frustrated user would finally get through to me via one of half a dozen abandoned email addresses and social media outlets. Now, there is a Facebook group created by former Sugar Quill members who alert me quickly when a problem occurs. At that time, I log in to my handy Yahoo Web Hosting control panel and push the "Repair Database" button, which runs "myisamchk" on the MySQL database and somehow repairs it. I have no idea if I am losing data. I am not 100% sure what is getting corrupted, although I am suspicious of the Invision Board "posts" table, which is over 650 MB in size. I do not possess the skills, the time, nor the motivation, to investigate further. This is not, of course, how I would choose to manage an archival collection that I had vowed to preserve into perpetuity, but I also do not have at my disposal, a team of talented developers and engineers to work with me on solving these problems.

Passwords
It takes me at least ten minutes to successfully remember any of the correct user name/password combinations for the various administrative portions of the Sugar Quill. I should write them all down on paper and hide them in my house somewhere. In fact, right now would be a good time to do that. But where has all the paper gone? And will I remember where I've hidden them?

Security
And speaking of security... I thought that I had turned off all of the posting features for the Sugar Quill back in 2009. However, a quick glance at the forums today shows me that in fact, the last post was by someone named "Hematite" on July 21, 2011. I was amazed that people were still trying to post messages. I learned about this from Facebook, where giddy "Quillers" shared the information that posting was still possible. I've chosen to leave the loophole active, partially because I am not sure how to fix it. In addition, there have been numerous upgrades and patches to Invision Board since we "closed," and I have not kept up with the times. Security holes are everywhere, if someone were feeling particularly malicious. We are being spammed hourly. As a result of this posting, I discovered 33,000 pending registrations, all from what looked like fraudulent email addresses.

Conclusion
I wish I had some neat conclusions. But the problems that I have illuminated here are not unique, and I shudder to think about what is being lost by people who are even less technology-savvy than I am. I *tried* to archive. I am still trying. What I need is to set aside a week of my life, hire a technologically-savvy consultant, and address the many issues that I have observed that are stopping me from fully archiving this website. I have not even stuck a toe in the waters of discussing things like emulation and environment. After all, in my mind, users won't really get a true Sugar Quill experience, until they can view it with a laptop computer fan loud enough to wake the dead, just as I did when I was creating the site. New tools are being developed every day, and the way that people create web content are changing. But many of the issues are the same. Companies fail, servers crash, people hit "delete" and don't have backups. The first step is to be aware of the issues and to be able to work with creators of these resources to have preservation in the front of their minds. Just as many paper records were lost to neglect, fire, flood, and other disasters, so will electronic records be lost to neglect, crashes, failures, and other disasters. The goal, however, is to ensure that as much content is saved as is possible.


2 comments:

  1. Wow. I don't know if anyone will ever read this comment, but as an old member of the Sugar Quill, these archiving problems are sad. Even now, I occasionally go back there and read a story or two.

    I'm not sure what the status of these archiving problems is today, but I'm a web developer today (back then, I was a high school dropout :p ) - I'll try to catch you, Zsenya, on Facebook.

    StereoM

    ReplyDelete
  2. I loved Sugarquill---I am still sad that it closed down and that one cannot read the forums anymore...Would love to read the story recommendation thread for instance...but alas it is not meant to be.

    thank you for creating this wonderful place! I will never forget it!

    ReplyDelete