Reddit is blocking The Internet Archive from indexing its posts

reddit.png

Reddit is putting in measures to prevent The Internet Archive from scraping its content and backing it up. Reddit officials say this new change is a result of AI scrapers being able to parse through The Wayback Machine's content, even when Reddit blocks them from scraping their own site. Currently, Reddit allows AI to be trained off of its data, but only through paid access, such as its deal with Google in 2024. There's further concern from Reddit's side, as they also believe that The Internet Archive should better respect privacy and not host content that users choose to remove, such as deleted comments.

The new changes have already gone out, and for now, The Wayback Machine can only preserve the front page of Reddit, and not specific subreddits or user profiles.

:arrow: Source
 
Reminder that other web archive sties exist, many of which they will have a much harder time cooperating with because they don't have as big a net they're trying to cast as Internet Archive does and are run often by people who arn't as willing to walk on eggshells as IA.
 
  • Like
Reactions: I pwned U!
If Reddit didn’t exist we would have more great sites such as this. Reddit already block VPN guests and having to hit 10 (+) buttons to expand the sub comment answers to things makes me avoid it like the plague. Posts also die and I hate the format where new comments don’t bump an old thread back up.
I'm so tired of "real stories" from Reddit being read on YouTube and then it sounds like the fakest bs anyone came up with.
 

Reddit is putting in measures to prevent The Internet Archive from scraping its content and backing it up. Reddit officials say this new change is a result of AI scrapers being able to parse through The Wayback Machine's content, even when Reddit blocks them from scraping their own site. Currently, Reddit allows AI to be trained off of its data, but only through paid access, such as its deal with Google in 2024. There's further concern from Reddit's side, as they also believe that The Internet Archive should better respect privacy and not host content that users choose to remove, such as deleted comments.

The new changes have already gone out, and for now, The Wayback Machine can only preserve the front page of Reddit, and not specific subreddits or user profiles.

:arrow: Source
They already blocked indexing on other search engines than Google. The AI excuse is just an excuse.

Would anyone here miss Reddit if it didn't exit?
Yes. Even if you don't use it, a lot of communities exist only on Reddit and has a great repo of knowledge. A lot of stuff you can only find good info if you Google "x + Reddit". If it ceased to exist, a lot of knowledge would be lost and the communities that there are would probably move discord servers and the like, harder to find and even less search indexable.
 
Very disappointing considering how many help "forums" are on Reddit and how so many people there are mass deleting their posts.

Reddit is still the biggest source of human help online; most of my searches have "Reddit" at the end now that AI has taken over the internet.
The only thing reddit helps with is telling you to do the one thing you didn't want to
 
Bad guy Reddit

Sadly, the world is full of normies with the critical thinking skills of a wet cucumber. Reddit, YouTube, TikTok, Twitter, and Facebook will continue to thrive because the vast majority of the population aren’t willing to spend more than a few seconds looking past the surface of the internet
Crazy how "normie" exists as a derogatory term. Somehow normal people are the enemy. No one is truly a normie anyway. What do you know about interior design? Pop culture? Horse riding, F1 racing? College football? Horror novelists? Musical Theater? Everyone has their niche. For the people whose thing is not tech, they will reach for the most user friendly option. That's why these services become as popular as they are.
 
The Internet Archive has had many websites already scraped by bots for years. Whenever you save a snapshot, its bots index the entire page into their database. Clearly, whoever works for Reddit doesn't understand how it works, so why would I give a shit? Reddit has already screwed its entire userbase in many controversies - from locking out bot developers and banning subreddits to censoring media. Typical corporation BS.
 
Last edited by SylverReZ,
Typical. Companies only care about "privacy" because it devalues their own access to your personal data if everyone can get it anyway, they want you to be a complete black box unless someone pays them for your data.

If it's a problem that the archive can contain later-deleted comments, tell me, do the AI datasets remove them too? No?
 
Unfortunately, yes. I utilize Reddit to find things and for troubleshooting technology as there isn’t a reliable search engine anymore.
The internet is taken over by 50% of bots and AI, there's no "perfect" search engine anymore, not without being redirected to fake sites or generated slop.
 
Would anyone here miss Reddit if it didn't exit?
Think you misspelled Facebook/instagram/twitter/tiktok

Reddit is where people go to get questions answered. Its literally Gamefaqs but instead of videogames people go to ask questions about life/the real world. Why other sites on the internet exist is beyond me as more than half of em is just places to sh4tpost and "hey look at me, look at me".
 
A lot of people saying that Reddit is useful because you can google “X + Reddit”

I should ask — has anybody else found this method drastically less useful than it was a year ago? Now, no matter how I re-word it, I only find the same Reddit thread over and over. And it’s not necessarily useful; it is commonly many years old and unanswered. It feels as though I am //the only person in the world// who has ever asked this question, or maybe one of 3 people, and it has never been answered. Is it just me feeling this way? I have this feeling like twenty times a week, whether I’m asking about N3DSXL paint chipping or how to light a studio with a smaller diffuser.

The Internet didn’t used to feel this way. It feels like insane amounts of information suppression and it is pretty upsetting. To think with the technology of this age, humans could share so much and be so smart; but the keepers of the information actively try to stop us from finding it.

So I resort to ChatGPT who at least typically gives a good answer, but yeah I’m certain this is not a good direction for humanity.
 
  • Love
Reactions: alexfree

Site & Scene News

Popular threads in this forum