What is lemmy doing about bot scrapers?

flango@lemmy.eco.br · 1 day ago

What is lemmy doing about bot scrapers?

Zak@lemmy.world · 1 day ago

If you’re concerned about bots ingesting the content, that’s impossible to prevent in an open federated system.

radix@lemmy.world · 1 day ago

It’s weird that this has become such a controversial opinion. The internet is supposed to be open and available. “Information wants to be free.” It’s the big gatekeepers who want to keep all their precious data locked away in their own hoard behind paywalls and logins.

If some clanker is going to read my words, it’s a very small price to pay for people being able to do the same.

1984@lemmy.today · edit-2 15 hours ago

It was open and free until big tech stole the software, packaged it as their own services under a different name, and made billions from it.

Now they are scraping all content on the web, to fuel another round of billions from Ai.

We are seeing how the web is dying, bots produce most of the content, and people will eventually stop using it, just like cable tv.

It was a nice run though. I really liked growing up with the web and computers. But the end result is Enshittification. :)

FaceDeer@fedia.io · 1 day ago

It’s a classic case of people being all for freedom until all of a sudden they think it negatively impacts them personally in some vague abstract way.

An AI training off of my words costs me nothing. It doesn’t harm me at all. Frankly, I like the notion that future AIs are in some small part aligned based off of my views as expressed through my writing.

GreyEyedGhost@lemmy.ca · 22 hours ago

It will harm the owner of the server, who will be serving a large amount of data to someone he may not want to, at his expense.

ms.lane@lemmy.world · 15 hours ago

So do ad blockers, yet we still use them.

GreyEyedGhost@lemmy.ca · 10 hours ago

Ads also cost many users to be served to their clients, and the more invasive and obnoxious the ad, generally the more it costs. If they don’t want to respect me, why should I respect them?

Krudler@lemmy.world · 20 hours ago

I’m not entirely sure that’s what the concern is, I think it’s that the writer is describing such an obscene influx of bot traffic that it’s must be a nightmare to maintain and pay for?

Rhaedas@fedia.io · 1 day ago

It’s a version of the age old question on how do you keep someone from stealing your images while still being able to show it. No one can see an image without having downloaded it already. The best you can do is layer in things like watermarks to make cleaning it into a “pure” version not worth the trouble. Same with text, poison it so it’s less valuable without a lot of extra work.

_cryptagion [he/him]@anarchist.nexus · 1 day ago

You can’t poison text in a way that’s meaningful to LLMs without making it indecipherable to humans.

sad_detective_man@sopuli.xyz · 1 day ago

I mean, reddit text is poisoned by virtue of being highly unhinged. It’s probably one of the best reasons not to use Ai right now, since its dataset is being formed from literal redditors.

Maybe we just gotta toxify it up here a bit

FaceDeer@fedia.io · 1 day ago

Doesn’t seem to have negatively impacted AI much.

hakunawazo@lemmy.world · 1 day ago

That’s what satire was invented for. /s

StinkyFingerItchyBum@lemmy.ca · 1 day ago

Same with sext, poison it so it’s less valuable without a lot of extra sex work?

bcovertigo@lemmy.world · 1 day ago

Accepting that your premise is true for individual texts, there seems to be a fairly flat number of poisoned docs needed regardless of total training corpus size. So the question is how to sneak that many docs into the corpus.

https://arxiv.org/abs/2510.07192

Zak@lemmy.world · 1 day ago

That’s DRM, and it only works if everyone is accessing the information on devices they don’t fully control.

What is lemmy doing about bot scrapers?

What is lemmy doing about bot scrapers?

The great scrape

Aggressive bots ruined my weekend