Over the the next three days, you will be hearing about a wide variety of subjects, all of them related to the Web. The Web, like this conference, is an exciting place to be. It’s certainly a fast-paced environment in which to work. We are constantly being confronted with new technologies, new tools, new ideas and new methodologies. It gives our work a certain frisson.
I have a favour to ask of you. Over the course of these three days, as you listen to the stellar array of experts that Aral has lined up for you, as you absorb their insights into the latest technologies and techniques, I would like you to apply an unusual frame a reference.
Instead of just evaluating tools and technologies in terms of their usefulness to you right here, right now, I would very much appreciate it if you were to occasionally step outside of the here and now. For every talk that you hear, take a moment to have the mental equivalent of an out-of-body experience. Think of it as an out-of-head experience. I would like you to spend a few moments of your time evaluating each talk, each technique, each technology from a long-term perspective.
I am concerned. I don’t want to sound like one of those irate Daily Mail readers bemoaning the youth of today and their lack of appreciation for the achievements of the past, but I do think that we, as an industry, have a tendency to focus a little too much on the present.
Sure, we like talking about the future but generally, it’s the near future that interests us: next week’s mobile phone, next month’s browser, next year’s hardware. How much thought do we give to the long-term future? How much of what are creating in the here and now will still be around in just one more generation?
To get you in the right frame of mind for long-term thinking, I’d like to take you back to the past.
With the fall of the Roman Empire in the 5th Century AD, there was a real danger that entire areas of knowledge would be destroyed. Fortunately the great works of ancient Greece and Rome were preserved, thanks to the tireless work of early Christian monks. They lived at the remote edges of Europe, on the almost-inaccessible islands off the coasts of Ireland and Scotland. There, they toiled ceaselessly at the task of cultural preservation, creating beautiful illuminated manuscripts.
This powerful story of a small band of scholars preserving civilisation for future generations is revived in works of speculative fiction. In A Canticle For Leibowitz, Walter M. Miller envisions a monastery in the post-apocalyptic landscape of the American Southwest where the remnants of scientific knowledge are guarded for future generations. More recently, Neal Stephenson’s Anathem pushes the perspective of long-term thinking and knowledge preservation to its limit.
Meanwhile, back in the Dark Ages of our timeline, the great works of science and philosophy from the ancient world were preserved thanks to early Irish Christian scholars. And yet, we know very little about these anonymous heroes. The minutiae of their day-to-day life was not preserved, although we are granted occasional glimpses. Every now and then, a scribe would add some observations of his own in the margins of a manuscript.
My hand is weary with writing, my sharp quill is not steady, my slender, beaked pen juts forth a black draught of shining, dark blue!
And here’s another:
Pleasant is the glint of the sun today upon these margins, because it flickers so.
These are the real treasures: little dollops of trivia served up in fewer than 140 characters.
So it is throughout history. Scholarly works are preserved while the inconsequential narrative of everyday life is lost. There are exceptions. The Book of Margery Kempe, written in the 14th Century, is the charming blog of a medieval woman.
Now we have the Web. It’s the perfect medium for recording personal narrative. We write, we post pictures, we upload video. Or, as the poet Patrick Kavanagh put it, we
wallow in the habitual, the banal,
wherever life pours forth ordinary plenty.
But will this ordinary plenty be retrievable in one hundred years, or even ten years? A decade is a long time on the web. I made my first website over ten years ago. It was hosted on some third-party Geocities-like domain. It’s gone now. Geocities yesterday, MySpace today.
And lest we derive a certain smug satisfaction at the thought of MySpace pages not being available a decade from now, spare a thought for all the other third-party services that we have entrusted with our data: Flickr, Vimeo, Tumblr; given the current financial doomsday scenario, the future existence of these storage providers is far from certain.
A valiant few have taken up the task of preserving our online culture. Brewster Kahle’s Internet Archive is a magnificent undertaking. But the scale of the endeavour is monumental. Saving our culture is a task that will probably need to be crowdsourced if it is to succeed.
The monks of the Dark Ages worked on vellum. We work with intangible ones and zeros. Without electricity, our recordings cease to exist. Vellum is more durable. But it’s very time-consuming to make exact non-destructive copies in vellum. Digital bits, on the other hand, can be easily copied. That gives me hope.
Storage, too, is a solved problem. Moore’s Law just keeps going and going. Our Turing machines are getting more and more powerful and spacious.
The real issue is with the configuration of our ones and zeros: data formats. Some data formats have a higher propensity for longevity than others.
For a start, there’s the complexity of the format. Plain text is very simple. Formatted text is slightly more complex. Images are a notch higher. Video is non-trivial. Information encoded in a simple format is more likely to be easily decoded in the future.
Open formats are better than closed formats. I don’t mean they are necessarily qualitatively better but from the viewpoint of digital preservation, over a long enough timescale they are always better.
The terms “open” and “closed” are fairly nebulous. Rather than define them too rigidly, I’d like to point to the qualities that can be described as either open or closed. The truth is that most formats contain a mixture of open and closed qualities.
First of all, there’s the development process of creating a format in the first place. On the face of it, a closed process might seem preferable. It allows greater control of how a format develops. But it turns out that this isn’t always desirable. The open-source model of development, for all its chaotic flaws, has one huge advantage: evolution. Time and time again, the open-source community has produced efficient, well-honed gems instead of the Towers of Babel that would be logically expected. That’s because Darwinian selection, be it natural or otherwise, will always produce the best adaptations for any environment. It doesn’t matter if we’re talking about ones and zeros instead of strands of DNA; the Theory of Evolution is borne out in either case. Microsoft aren’t getting their ass kicked by the Linux penguin or the burning fox of fire; Microsoft are getting their ass kicked by Charles Darwin.
Another open quality is standardisation. Again, at first glance, this might seem counter-intuitive. After all, the standardisation process is all about defining boundaries and setting limits as to what is and isn’t permitted. But this deliberate reigning in of the possibility space is what gives a format longevity. This will come as no surprise to the designers amongst you who are well aware that constraints breed creativity. As Douglas Adams said,
we demand rigidly-defined areas of doubt and uncertainty.
Standardisation doesn’t necessarily lead to qualitatively better formats. Quite the opposite. The standardisation process, by its very nature, involves compromise. But I would rather use a compromised standardised format than a perfect proprietary one.
While many formats are designed to be simple, open and standardised, there are some that are, from a digital preservation standpoint, crippled by design.
As the security expert Bruce Schneier put it,
digital files cannot be made uncopyable, any more than water can be made not wet. But that hasn’t stopped people from trying. I’m referring to what is so euphemistically called Digital Rights Management. The data encoded in these formats is doomed.
If you purchased a song from Virgin Digital last year, that music is no longer playable on any device today. The built-in DRM ensured that when the provider shut down, so did the data. It was the same story with Google’s brief flirtation with selling DRMd video. In both cases, paying customers were treated like criminals and yet it was the providers who reneged on their end of the deal, leaving customers high and dry.
Licensing, copyright and even “intellectual property”—a term so new, it has no legal meaning—all have their place but that place is not within a data format. The Embedded OpenType font format from Microsoft, for example, is a travesty. I can only take comfort from the fact that, given its self-crippling nature, the EOT format, like all DRM-laden formats, is dead. It just doesn’t know it.
Don’t entrust your data to zombie data formats.
As you listen to me talk about data formats and digital preservation, you might well be saying to yourself,
who gives a shit? Curating our culture for future generations is not our job. To paraphrase Leonard McCoy,
I’m a web designer, not a librarian!
Think about the qualities that I’ve listed as being beneficial for long-term thinking: simplicity, openness and standardisation. These qualities are desirable not just for the future but also for the present. These are the qualities that allow data to be portable.
Can I view source? Can I copy and paste? Can I syndicate? Can I mash up? If you can answer yes to all these questions, then your content will probably live a long and healthy life.
If you don’t care about longevity or portability, what about accessibility? I put it to you that what is good for digital preservation is good for accessibility: simplicity, openness and standardisation. I would go as far as to say that digital preservation is a form of accessibility; making your content accessible to everyone regardless of whether they happen to reside in the present or the future.
Longevity. Portability. Accessibility. If none of those arguments convince you, then I can only resort to the cheapest trick in the book and tell you that the Googlebot loves simplicity, openness and standardisation. Long-term thinking is good for your googlejuice.
With that, let me just reiterate my request that you apply a smattering of long-term thinking to the material you are about to dive into over the course of the next three days. But don’t obsess about it. The future is important—it’s where we’re all headed. But we live in the moment.
I sincerely hope that the ongoing narrative that we are all constructing online will be preserved for our future selves. But if not—if all that we publish will fade sooner rather than later—that’s all the more reason to treasure it now while it lasts. Even if your story will be forgotten in the future, it is still a story worth telling.
It may be that every blog post we write, every picture we post and every message we send is inherently ephemeral. In the words of a dying replicant:
All of these moments will be lost in time, like tears in the rain.
This presentation is licenced under a Creative Commons attribution licence. You are free to:
- Copy, distribute and transmit this presentation.
- Adapt the presentation.
Under the following conditions:
- You must attribute the presentation to Jeremy Keith.