Reddit is blocking The Internet Archive from indexing its posts

reddit.png

Reddit is putting in measures to prevent The Internet Archive from scraping its content and backing it up. Reddit officials say this new change is a result of AI scrapers being able to parse through The Wayback Machine's content, even when Reddit blocks them from scraping their own site. Currently, Reddit allows AI to be trained off of its data, but only through paid access, such as its deal with Google in 2024. There's further concern from Reddit's side, as they also believe that The Internet Archive should better respect privacy and not host content that users choose to remove, such as deleted comments.

The new changes have already gone out, and for now, The Wayback Machine can only preserve the front page of Reddit, and not specific subreddits or user profiles.

:arrow: Source
 
'better respect privacy'?
As soon as you post something online, you consent to it being forever publicly available to the entire world.
 
  • Like
Reactions: Lunkarya
Didn't reddit strike a deal with Google so Google could train its AI on reddit? I feel like that's the more likely explanation blocking the Internet Archive so it's harder to harvest data from reddit for other companies.
 
This might have been a big loss a few years ago. Sadly, reddit is now mostly bots and OF accounts spamming every thread for visibility. Every time I go back after swearing off of it it's somehow worse than before.
 
What many ppl don't seem to get is that AI is killing website traffic because when you google something now you don't have to navigate any website as the first thing you see is an AI anwser which most ppl are content with and move on with their day or next query . . .
GBAtemp made a post about how they are struggling with revenue as well and how this is part of the reason.
So its not only a reddit thing, you have to see beyond that and understand what this implies for the internet as a whole given how reddit is reacting to this new problem.
 
Would anyone here miss Reddit if it didn't exit?
the vast amount of tech solutions reddit has on it, now that Microsoft decided to delete about 90% of its old articles, is TOO VALUABLE to lose

looking for a specific driver for a piece of hardware that Microsoft forced the developers to hide compatibility for so they could push the next ultra-spyware windows version? that driver is either on archive.org or on reddit as a dropbox link, and NEVER on the developer's website.
 

Site & Scene News

Popular threads in this forum