According to WIRED, a number of prominent news websites in the United States have begun to restrict the snapshot feature of the Internet Archive's "Wayback Machine," preventing it from scraping and storing web page content. This action has been taken by 23 major news outlets, including USA Today and The New York Times, as well as the social platform Reddit. The Guardian has also limited archiving by restricting API access and filtering articles. The main driving force behind these restrictions is the fear that AI companies are leveraging archived data to train their models, which could potentially infringe upon copyright laws and foster unfair competition. Publishers and AI firms are currently embroiled in a heated debate over the legality of data usage, with over a hundred AI-related copyright lawsuits in the United States centered on this very issue. As a non-profit digital archiving tool, the Wayback Machine has successfully archived over one trillion web pages. However, if it continues to face barriers in accessing mainstream news sources, its archiving endeavors will be significantly hindered, potentially resulting in the loss of valuable early digital historical records. This, in turn, would have a detrimental effect on journalistic oversight and the use of evidence within the judicial system.
