Fake News Creates Fake History: Why We Archive the Web

Preserving and archiving the internet will have tremendous impact on the way future generations experience our world.


For many, internet browsing in the 1990 to the early 2000’s came with a belief that, if it was published on the internet, it must be true. While this belief has since faded away, there are still those who carry the belief that if it is on the internet, it will stay on the internet. In actuality, what is on the internet today could easily be gone tomorrow. You may find that the information you are looking at is no longer available by the time you pour your second cup of coffee.

The World Wide Web is an intrinsically ephemeral being. In a world where a 24-hour news cycle has been reduced to minutes with information spreading and sharing moving faster than ever before, web pages are constantly being updated; information is removed, changed or altered so the readers and visitors of the site can begin absorbing and sharing the next wave of information. This fast paced, ever-changing exchange of materials can leave a gaping hole in internet history and in the documentation of materials published on the internet.

It is important to recognize that the internet of today will be crucial in giving future researchers and historians a glimpse into our present, an understanding of what the world looked like, how the political atmosphere impacted society, what daily life included, how people behaved and how civilization operated. As information is deleted or removed from the internet, a significant part of history and future research goes along with it.

The practice of archiving the web preserves the internet and the content being produced and erased every day. In preserving our digital heritage, web archiving is the process of gathering, preserving and creating open access to the historical information published online, ensuring the materials will live on and long beyond their transitory purposes.

In celebration of 20 years of the Israeli internet and in recognition of the critical importance of web archiving, The National Library of Israel and the Open Media and Information Lab (OMILab) at the Open University of Israel, organized an international conference on “Web Archiving: Best Practices for Digital Cultural Heritage,” in April 2018. The conference brought together leading researchers and practitioners in the field of web archiving and web historical research from the United States, France, United Kingdom, Denmark, The Netherlands, Belgium, Portugal and Israel.

Prof. Niels Brügger from Aarhus University, Denmark explained that web archiving, a form of deliberate collecting and preserving of web material, is critical to conserving our digital heritage. “Without that preservation internet materials are doomed to disappear,” he said.

“The online web is not an archive,” explained Brügger. “It is volatile, subject to deletions and changes at an unprecedented pace compared to other media types.”

Brügger suggested that the maximum lifespan of the average webpage is about a year – and once it’s gone, it’s gone for good. The importance of archiving the web is significant – without it, the internet would have no memory.

Prof. Niels Brügger. Photo: Hanan Cohen

Social Media: Our personal web archives

Web archiving has also begun to play an important sociological role in preserving the personal history of the everyday person. The internet has become a gold mine of memories – of family photos, personal narratives and videos of special moments.

With the rise of social media, people are sharing and uploading more information about themselves, their families and their friends than ever before. By sharing statuses or tweets, we provide a synopsis of our day-to-day lives. By sharing images and videos, we give a primary look at our personal experiences and present a picture of what our greater world looks like. This wealth of valuable data is uploaded by users to Facebook, Twitter, Instagram and other social media sources – data that could be invaluable to future generations.

“Social media makes preserving and archiving the internet more challenging but it also makes it even more important because what humans are sharing is inherently ephemeral,” explained Mark Graham, Director of the Wayback Machine, a digital time machine in which over 20 years of internet archives are accessible.

“This information that people are sharing has real time value that emphasizes the ephemerality of the internet, replacing longer-term and more physical objects like books and magazines,” he explained.

Mark Graham. Photo: Hanan Cohen

How do we choose what to preserve?

Web archiving is accomplished with the use of web crawlers that comb the web, gathering information in real time. The material is then stored in its (sometime fragmented) HTML form. The vast expanse of the internet makes it difficult to conceptualize what it would take to preserve all of the information on the web. The ethical question presents itself – what should be stored and what should be prioritized in the current expanse of the internet? What should be preserved and how do we ensure that what is being digitized is real information and not “fake news?”

For experts like Brügger and Graham, the answer is simple: digitizing everything – or as much as possible. Nothing should be left behind.

According to Brügger, “The archiving of web material is collecting what should be preserved – whether it was real or fake-if it was online, it should be preserved.”

Fake news is a part of our reality. It existed on the internet and therefore it should be preserved for future generations to study and understand. If we succeed in archiving as much as possible, the fake news will fall by the wayside in real time context.

“Just as we can’t stop people from producing fake news, we cannot expect to protect the internet archive from fake news,” Graham explained. “What we can do is use sunlight as a disinfectant. We can help people differentiate fake news from real news within the greater context.

“We need build tools to allow people to evaluate the information more easily and to allow them to make smart and educated decisions. The longer-term work needs to be done through education to give people what they need to exercise good judgement,” said Graham.

The ultimate goal of web archiving

The objective, and ultimately the most difficult challenge of archiving the web, is to find a way to open the archive to researchers and to the public at large, noted Oren Weinberg, the Director General of the National Library of Israel (NLI).

“The National Library of Israel serves as the collective memory of the Jewish people…the web is the history of tomorrow so we need to take it upon ourselves to preserve it for generations to come.”

In archiving the web, we can retain and appreciate an entire generation’s way of thinking and existence. Without these efforts, we lose that piece of history – the art, the photos, the videos, and the overall experiences of the human collective. By archiving the web, by looking at the past and recognizing its significance, we open the door to societal progress and a greater historical appreciation.

Comments for this article

Loading more article loading_anomation