The always interesting Jill LePore, in the New Yorker, on the Wayback Machine and “The Cobweb: Can the Internet Be Archived?“:
… The average life of a Web page is about a hundred days… It might seem, and it often feels, as though stuff on the Web lasts forever, for better and frequently for worse: the embarrassing photograph, the regretted blog (more usually regrettable not in the way the slaughter of civilians is regrettable but in the way that bad hair is regrettable). No one believes any longer, if anyone ever did, that “if it’s on the Web it must be true,” but a lot of people do believe that if it’s on the Web it will stay on the Web. Chances are, though, that it actually won’t. In 2006, David Cameron gave a speech in which he said that Google was democratizing the world, because “making more information available to more people” was providing “the power for anyone to hold to account those who in the past might have had a monopoly of power.” Seven years later, Britain’s Conservative Party scrubbed from its Web site ten years’ worth of Tory speeches, including that one. Last year, BuzzFeed deleted more than four thousand of its staff writers’ early posts, apparently because, as time passed, they looked stupider and stupider. Social media, public records, junk: in the end, everything goes.
Web pages don’t have to be deliberately deleted to disappear. Sites hosted by corporations tend to die with their hosts. When MySpace, GeoCities, and Friendster were reconfigured or sold, millions of accounts vanished. (Some of those companies may have notified users, but Jason Scott, who started an outfit called Archive Team—its motto is “We are going to rescue your shit”—says that such notification is usually purely notional: “They were sending e-mail to dead e-mail addresses, saying, ‘Hello, Arthur Dent, your house is going to be crushed.’ ”) Facebook has been around for only a decade; it won’t be around forever. Twitter is a rare case: it has arranged to archive all of its tweets at the Library of Congress. In 2010, after the announcement, Andy Borowitz tweeted, “Library of Congress to acquire entire Twitter archive—will rename itself Museum of Crap.” Not long after that, Borowitz abandoned that Twitter account. You might, one day, be able to find his old tweets at the Library of Congress, but not anytime soon: the Twitter Archive is not yet open for research. Meanwhile, on the Web, if you click on a link to Borowitz’s tweet about the Museum of Crap, you get this message: “Sorry, that page doesn’t exist!”
The Web dwells in a never-ending present. It is—elementally—ethereal, ephemeral, unstable, and unreliable. Sometimes when you try to visit a Web page what you see is an error message: “Page Not Found.” This is known as “link rot,” and it’s a drag, but it’s better than the alternative. More often, you see an updated Web page; most likely the original has been overwritten. (To overwrite, in computing, means to destroy old data by storing new data in their place; overwriting is an artifact of an era when computer storage was very expensive.) Or maybe the page has been moved and something else is where it used to be. This is known as “content drift,” and it’s more pernicious than an error message, because it’s impossible to tell that what you’re seeing isn’t what you went to look for: the overwriting, erasure, or moving of the original is invisible. For the law and for the courts, link rot and content drift, which are collectively known as “reference rot,” have been disastrous… Last month, a team of digital library researchers based at Los Alamos National Laboratory reported the results of an exacting study of three and a half million scholarly articles published in science, technology, and medical journals between 1997 and 2012: one in five links provided in the notes suffers from reference rot. It’s like trying to stand on quicksand.
The footnote, a landmark in the history of civilization, took centuries to invent and to spread. It has taken mere years nearly to destroy. A footnote used to say, “Here is how I know this and where I found it.” A footnote that’s a link says, “Here is what I used to know and where I once found it, but chances are it’s not there anymore.” It doesn’t matter whether footnotes are your stock-in-trade. Everybody’s in a pinch. Citing a Web page as the source for something you know—using a URL as evidence—is ubiquitous. Many people find themselves doing it three or four times before breakfast and five times more before lunch. What happens when your evidence vanishes by dinnertime?…
The address of the Internet Archive is archive.org, but another way to visit is to take a plane to San Francisco and ride in a cab to the Presidio, past cypresses that look as though someone had drawn them there with a smudgy crayon. At 300 Funston Avenue, climb a set of stone steps and knock on the brass door of a Greek Revival temple. You can’t miss it: it’s painted wedding-cake white and it’s got, out front, eight Corinthian columns and six marble urns.
“We bought it because it matched our logo,” Brewster Kahle told me when I met him there, and he wasn’t kidding. Kahle is the founder of the Internet Archive and the inventor of the Wayback Machine…
Note to experts: Yes, LePore does discuss Perma.cc and the work of Herbert Van de Sompel at the Los Alamos National Laboratory: “This month, the Memento group is launching a Web portal called Time Travel. Eventually, if Memento and projects like it work, the Web will have a time dimension, a way to get from now to then, effortlessly, a fourth dimension. And then the past will be inescapable, which is as terrifying as it is interesting…”
srv
Who will save us when John deletes the archives.
Morzer
If they actually can retrieve the “lost” intertoobz, Sullivan’s going to have to unretire all over again to defend his real legacy.
Ohio Mom
I was going to say, another lost library of Alexandria but thought I’d better google that first to make sure I remembered that story correctly. Turns out that Alexandria was only one of many lost and destroyed libraries. The more things change…
Mike in NC
Archiving the Internet is the worst idea since the Third Reich, which millions of people supported, and still do today.
different-church-lady
Wait, I thought were were told relentlessly, by quite reliable sources, that the NSA had a copy of the entire internet in that facility in Utah?
Ohio Mom
This stood out for me:
The internet really *is* going to make us all stupider!
jl
” You can’t miss it. ”
Article is wrong. You can miss anything in the Presidio.
And I thought the Internet Archive was on Funston between Geary and the Presidio, They have a place in the Presidio too?
Or the reporter drove down Park Presidio Blvd and thought that was in the Presidio?
Edit: And also too, Balloon-Juice pages will never die, they are immortal. The TunchForce is watching.
Anne Laurie
@different-church-lady: Sure, that’s what THEY want you to believe.
sharl
I’m still sad about the end of commenting site Haloscan some years ago; bought by competitor JS-Kit then promptly zapped into nonexistence.
Then as now, most comments were deservedly forgettable (my own rare comments included), and of course some were awful. But on occasion – probably rare occasion – you could get some great conversations, or at least that was my experience as a mostly lurker in Atrios’ comments. And you could search in the haloscan comments too, if you could remember a couple or three uncommon key words, phrases, and/or commenter nyms to put into the search argument, as well as the site of the original blog post.
I think there could have been more than a few dissertation and thesis projects in communications, behavioral science, and/or sociology that could come from a carefully designed approach to comments sections; actually that may be happening now, since comment sections are still around, even as traditional blogs apparently are becoming less popular. But as far as I know, there is no archive of Haloscan comments anywhere, at least not for general public access. Quel dommage…
Major Major Major Major
Thanks for the note to experts. I was going to post with umbrage about digital archiving otherwise :)
@Mike in NC: why?
Major Major Major Major
@sharl: disqus might offer a similar database to tap into. If they wanted to. Or livefyre. And of course there’s an xml/rss standard for comments that makes scraping some frameworks really easy…
pseudonymous in nc
@sharl:
Some of it’s archived, but it’s archaeology: you have to know where to look and what to do once you start looking. Other bits are preserved as a kind of social history, through people’s memories of what was going on, but that’s always more up for debate. I could, for instance, dig out Steve Gilliard’s old blog posts and perhaps some of those freewheeling comments threads but it takes a social history of left-wing political blogging to tell you why they matter. (Miss you, Steve. FTFY.)
Right now, the online archiving process feels like 19th-century archaeology, where colonial types went to Foreign, dug shit up and carted the shiniest and prettiest bits back to the motherland without bothering to ask permission. It took decades for the discipline to reach a point where digs took context seriously: that bits of pottery on top of other bits of pottery could tell you more about who lived in a place than buried treasure.
When the tools to analyse content and context finally emerge, I hope Kahle’s archives will be up to the task. But even then, it misses all sorts of the really early web that didn’t stick around. But that’s a different task from that of shoring up citations from link rot.
karen marie
Hahaha.
Origuy
@Ohio Mom: When Henry VIII dissolved the monasteries in 1536, untold hundreds of manuscripts were lost. Not all were sermons and copies of the Gospels; many were books of music and Anglo-Saxon poetry.
ET
As a librarian I have always thought the idea of archiving the Internet may be OK in theory but breaks down is less than a min. once you start talking about doing of it. It may be possible to archive a large set of similar things to present a big picture of a specific event like what the Library of Congress does for elections or themes, but it really isn’t scalable.
I love the Way Back Machine but even they don’t get everything and much of what they do get – particularly the further back in time you go – is not deep or is quite incomplete. It has been great for things that people think companies would keep on their website but don’t – annual reports to shareholders – and the like but often times the underpinning technology makes a capture impossible.
Ohio Mom
@Origuy: Yes, when I googled “lost libraries,” the Wikipedia article had quite a long list of libraries throughout time and geography. It’s sobering to think what has been lost.
As far as what we will lose now that we are in the digital age, I think particularly of the blogs that are documenting real history in real time — Diane Ravitch’s blog is an example that comes to my mind. Not only is she explaining the hows and why of the school deform movement, but then her commentators chime in and say things like, “I’m a third grade teacher in Nebraska, and let me tell you how this is affecting us…” It really adds a lot to have those concrete examples. It will be a treasure trove for future historians of American education and politics, but only if it still exists. I hope she is printing it all out on acid-free paper.
I know there are plenty of other blogs that focus on other fields and areas, on subjects that I don’t follow, but they are out there and all that information is going to be lost, too.
And then there are all those emails. Historians piece together quite a bit from old letters but now that we email instead of snail mail, the only thing left years from now are going to wedding invitations and greeting cards. All the long, heartfelt missives, sprinkled with incidental information about the world-at-large, will be POOF.
sharl
@pseudonymous in nc: Aw, man, Gilliard. I kinda developed mixed feelings about my own emotional investment in the well being and ultimate death of someone I never met, but invested I was. I hope his family is doing well; they were quite surprised about Steve G.’s “secret life” as a blogger, and his legion of fans, when all that came to their attention in his final days and immediately following his death.
I’m kind of glad to see that you in particular responded. I’ve seen you here and there for most of the time I’ve been online (mostly as a lurker in my earlier years). Whenever I’ve seen a story about BBC letting people go, I recalled – correctly I think/hope – that you actually know (knew?) people there, and I wondered, is there anyone of pseudo’s acquaintance left at BBC?
Ah well, time marches on…