The government's website has undergone major changes since President Donald Trump took office.
Some changes are routine. This includes exchanging the current president and vice president for his predecessor on the official White House website.
However, other changes are moving further. Several sites have now gone offline, including usaid.gov, Reproductiverights.gov, and the Spanis-Language version of whitehouse.gov. The remaining sites scrub certain data and terminology to comply with Trump's executive orders targeting “gender ideology” and DEI.
This is the acceleration of a problem called digital decay or Linkrot. As media outlets drop, businesses upgrade their web infrastructure, and organizations remove information they think is no longer worth or relevant, a massive amount of the internet is disappearing. A recent Pew Research Center survey found that 38% of web pages that existed in 2013 are no longer available. Losing those pages means losing some of our own records, as much of our culture is happening online now.
Wayback Machine Director Mark Graham joined Sean Rameswaram I explained today To talk about the digital collapse, what his team is doing to combat issues both in general and in Trump's second term, and why internet preservation is so important.
Below is an excerpt from the conversation edited for length and clarity. There's a lot more to the full podcast, so listen I explained today Wherever you get your podcasts, including Apple Podcasts, Spotify, Stitcher, and more.
For those who may have come across your website but don't really know what you're doing, can you give them a sense of what you've saved in 30 years?
Where do you start? It's like walking into a very large library and saying, “Show me your favorite book.”
Last year there was a big news article about MTV News being shut down. The founding editor wrote about it on LinkedIn, and many other editors were talking about it. “My God, all our articles are gone. They're missing,” and I casually walked into the conversation, “Hello, uh… check out the wayback machine.” I went.
They said, “Oh, what a god, you guys got it all. What did you do?'We did nothing. When the site goes down Because we've been working all the time. We have been working on public web archives that are continuously and continuously publicly available, as they are public. If we need to pay attention to something after it goes down, that means we've ruined it.
So, what are you doing before these sites go down so that people can know that Everlast was singing in 2004?
Set up web crawlers and archive software on missions every day to identify and download web pages and related web-based resources. We bring in millions of URLs every day, a signal to where new material is published on the web. And make sure you archive these URLs and all web pages associated with those URLs.
Next, look at those pages and identify links to other pages. And we'll go Those Pages and we archive them. So we get this metaphor of spider-like raw squealing across this web.
The ultimate result is adding over a billion archived URLs to your Wayback machine every day. Added to the Wayback machine, this material is indexed and is immediately available to anyone visiting Web Dot archive.org and entering the URL. They can then always view the archive history of web pages available from the URL.
“That's where you get this metaphor that crawls like a spider across this web.”
I want to talk about the government website, because that's why we're having this conversation today. Most people probably think it will help the government archive government websites. But here we are in the new administration, the website is gone and we are back online, and people are worried. You – an Internet archivist – see this happening and how do you react to it? Is it better or worse than normal non-government websites offline or are they worse?
Well, as an American, my taxes help pay some of this stuff, and many of them benefit people. Certainly my initial reaction was: That may not be that good.
I would like to emphasize that the National Archives and the Records Bureau are also archived and conducting Library of Congress. So we're not the only game in town. But for some reason, we have attempted to do many archives of the current public web, including the US government website, and one of the key players to make those archives available in near real time. It seems to be.
Did you catch the guard off when you saw the new administration deleted a web page and deleted a website?
In some respects, this is normal and expected. Frankly, this was what happened to each administration at the time we were working on this initiative. So, it's under new control, right? You wouldn't think that the Whitehouse.gov website, which is under the new presidential administration, is the same as before. You will see the BIOS of people who are part of the current administration, the news of that administration. We try to get out of the way to predict how often we should archive web pages to get a pretty good shot to get those changes.
The Whitehouse.gov site clearly says it controls the management. I think people understand that to some extent. Joe Biden's administration probably didn't post a trolley Valentine's Day about immigration to his Instagram account a year ago. But what we're looking at here is the websites people need, websites that take public health information offline – easily, forever.
Is it to a different level than what we saw?
it's true. that's right. That's not true. It certainly differs in terms of numbers [of changes] – At first glance! We are still in the early stages of this administration, but yeah, I'd face it and say you're right. Historically, we have not seen major US government websites filmed offline, as we did with USAID. But I'm going to leave such analysis to others and focus on really trying to archive the material.
Wayback machines and Internet archives are funded primarily through donations. This is the generosity of people, institutions, and even governments. Is it enough to archive the Internet to the extent that future generations hope and need?
“Sufficient” is a very subjective term. As an archivist, for me, that's not enough. I don't know, and no one knows, what is used, what is value, what is important – perhaps even in the near future of tomorrow, much less of a very distant future. As millions of people use our site every day, we get a lot of feedback from them. It motivates us, but it also helps us guide us, and continually encourages us to do a better job with being the best library we can.
“As an archivist, that's not enough for me.”
You've been involved in this for nearly 30 years. Certainly, you saved a lot. Certainly there are a lot of things falling out of the crack. Is there something that slides through cracks that might suggest to the audience what is lost when we can't archive to the extent we want or want to?
Well, I got it! This is in recent history. Apparently there was a page on the CDC website about bird flu last week, but it only happened for a few minutes.
And what are we losing that fleeting web page, with that minor on the CDC website, perhaps by losing a major web page?
Well, we're losing some part of the story, right? We are undoubtedly losing some of our understanding of the evolution of critical health issues. I don't know where this is going. I think that's another point, right? You don't know now what will become very important in the near future or in the long run.
There was a heated debate in Martin Luther's time. Much of that discussion took the form of what was written in the pamphlet. Pamphlets at the time were considered to be of little value. People read and shared them, but they didn't necessarily save them. So today, scholars of the time, or someone like me, are strangely curious – give what I give to those pamphlet collections.
In a way, it compares the CDC website to Protestant reforms. But I think you mean that, right?
i will do it! Because I don't know. Really you can't Know without the benefit of long historical views. That's not what we have access to today. why? Because there is no real-time machine.