Digital Preservation

The rain seems to be constant these days and from what I gather, we are going to see more rain for the rest of the week. I cancelled a golf outing yesterday evening with friends so I could watch the Memorial tournament played in Ohio last week. Something told me that we are in for an exciting finish with Tiger going for his 73rd win to tie Jack Nicklaus for the second place in all time PGA wins. And it was exciting indeed. Tiger’s enormously risky shot on the 16th set the stage for the rest of the tournament and a great finish.

I spent a couple of days last week in beautiful Middlebury, VT, attending a gathering of Oberlin 17, the Northeast schools who belong to the Oberlin group.  The very first exercise was a chance to talk about what we have done in the past year, what questions we have for the others and what are the opportunities for collaboration. There were considerable overlap in that most of us are doing very similar things, have questions regarding very similar issues and would like to talk about collaborations in areas of considerable interest to many. Except, when we sat down to talk about the specifics of collaboration, I didn’t get the feeling that we are going to see much in the way of progress. Call me a skeptic! The reason is, as one of the participants pointed out, some of the calls for collaborations are weakened by the “these problems are local” issue. For eg. a call for shared instructional technology resources is a great concept on which we should be able to collaborate. But, the faculty who are used to a support model and service expectations from our own staff are anxious about this model where the support person is elsewhere. There are also a lot other logistical constraints such as who is managing this person, how are we dividing up the time that this person is expected to allocate to each campus, so on and so forth.

There was a question about digital storage management for which I answered based on what we are doing, but then I was reminded that the issue of digital preservation is a much bigger one. And it is indeed a huge issue!

Hathi Trust  and the Internet Archive  are two major non-profit initiatives in the digital preservation realm and Google is a big player also. It is fair to say that the digital preservation remains an open question that is yet to be answered and we, like many others, are in the sidelines watching the emerging conversations around this. We use the Internet Archive to deposit some of our content. We also have begun depositing institutional materials as well as faculty and student scholarship in Digital Commons from bepress. You can view what has been deposited in our institutional repository. We also use Shared Shelf from ArtStor to store and manage our image collections, which has a preservation component. So, in all, we, like many, are dabbling in different ways to get our feet wet, but not having international standards and movements to rely on, no one really wants to commit to anything yet.

Preserving printed material and digital content have a lot of similarities, but there are striking differences. The major ones are the definition of scholarly work, the static nature of the book and the scale of digital content.

  • In the digital world, what exactly are to be considered scholarly work (and who is responsible for defining them)? Peer reviewed work such as eBooks and journal articles are the obvious ones, but how about blogs and websites to name a few?
  • The printed version of a book is static. Every new edition, with or without significant changes (changes between editions are more common in textbooks) results in a new printed book. On the contrary, blogs and websites are constantly undergoing changes and even if we have version controls and snapshotting such as the Wayback Machine from the Internet Archive or even Google search they don’t detect all changes.
  • The scale of digital content is huge in comparison to the printed material. Of course, unless a clear definition of which of these content are preservation-worthy is established, we will suffer from the issue that in the eyes of the content creator, everything is preservation worthy (look at any institution’s users unwilling to part with years of web content!)

I am not sure whether the early printing of books were done with preservation (for centuries) in their minds. However, now that we have done a wonderful job of preserving some of the priceless early works, we are at a point where we want to apply the same criteria for digital content. In other words, if a printed book can survive 300 odd years, we want to make sure that the digital content survives at least that much if not much longer. It is this expectation that is causing all sorts of issues. We have seen many technologies die premature death – VHS, Betamax, CDs and soon DVDs to name a few. Similarly, computer hardware and software are changing so rapidly that content that were created barely 10 years ago cannot be accessed now. Even if one could, there are no guarantees that the original format is preserved. Of course, these are extremely important issues for scholars, but many technologists may not feel the same way! For eg. how can we be sure that the jpeg images and PDFs will continue to be accessible in their original format for centuries to come.

So much has been written about this topic that there is no particular need to rehash it all here except to say that this remains a huge problem and there does not appear to be a coherent, practical, persuasive and trustworthy effort that appeals to a large group of us that is being made.

The bottom line for us is that neither we as a single institution, nor a prestigious group like the Oberlin group, can do this on our own. I am sure that the national organizations such as ALA are working hard on this issue and are partnering globally with library associations from other countries as well as technology organizations. Like everything else in life, I am confident that we will find a way to solve this problem. My gut feeling is that this will follow the path of all other disruptions brought on by technologies – new paradigms for digital preservation will emerge, current practitioners will be very unhappy about the changes, new practitioners will emerge who will adopt these paradigms because there is nothing else that can be done and the cycle will begin all over in a few more years.

The most important first step to preservation is to provide an infrastructure to our faculty and students to store their scholarly work. This means being bold about providing enough storage, backup and restore, recognizing that in the absence of a clear definition of what is considered scholarly work, it is better to be a bit more expansive than restrictive. Thanks to various options for affordable storage including the cloud, we can do this. And have clear policies that articulate acceptable use of these infrastructure as well as setting expectations on the long term accessibility of the stored content. In other words, we want the faculty and students to understand that if they stored a file in Word 10 format today, we are only guaranteeing that it will be accessible in its original format when they want to access it in the future. We are not committing to making sure to convert them to a new format of the future (unless we manage to get into that business somehow). Of course, many will not read these policies and fine lines and there will be a few who will potentially misuse the infrastructure for personal use. But, those are not reasons not to do this. Because, when standards emerge and we want to be an enthusiastic participant, we will be ahead of the game if we can easily access the content of our faculty and students!

On a personal note, I am preserving digital content in volumes that were unthinkable by my parents and relatives. Every picture that has ever been taken (be it printed or digital), my daily sugar reading and a host of other data on me, my tweets, my blog posts, emails etc. etc. are all saved in multiple places.  I am sure many of you are doing this too. I ask my kids if they will ever look at these. Their answer is “we already ignore your invites to look at them now, what makes you think that we will look at this later?” It is true that I share too many pictures to their liking, but I also feel that as they age, they will be thankful (forever hopeful!).

What I fear is that by the time these are passed on to my future generation, they may be looking at a picture of me and my wife with little resemblance to the way we look now because of all the transformations to the image format. Hopefully, the text will remain as is and they get to read this blog post. Oops… blog may be a thing of the past by then.

Leave a Reply