What is lemmy doing about bot scrapers?

flango@lemmy.eco.br · 1 day ago

What is lemmy doing about bot scrapers?

Rhaedas@fedia.io · 1 day ago

It’s a version of the age old question on how do you keep someone from stealing your images while still being able to show it. No one can see an image without having downloaded it already. The best you can do is layer in things like watermarks to make cleaning it into a “pure” version not worth the trouble. Same with text, poison it so it’s less valuable without a lot of extra work.

_cryptagion [he/him]@anarchist.nexus · 1 day ago

You can’t poison text in a way that’s meaningful to LLMs without making it indecipherable to humans.

hakunawazo@lemmy.world · 22 hours ago

That’s what satire was invented for. /s

sad_detective_man@sopuli.xyz · 1 day ago

I mean, reddit text is poisoned by virtue of being highly unhinged. It’s probably one of the best reasons not to use Ai right now, since its dataset is being formed from literal redditors.

Maybe we just gotta toxify it up here a bit

FaceDeer@fedia.io · 23 hours ago

Doesn’t seem to have negatively impacted AI much.

StinkyFingerItchyBum@lemmy.ca · 24 hours ago

Same with sext, poison it so it’s less valuable without a lot of extra sex work?

bcovertigo@lemmy.world · 1 day ago

Accepting that your premise is true for individual texts, there seems to be a fairly flat number of poisoned docs needed regardless of total training corpus size. So the question is how to sneak that many docs into the corpus.

https://arxiv.org/abs/2510.07192

Zak@lemmy.world · 21 hours ago

That’s DRM, and it only works if everyone is accessing the information on devices they don’t fully control.

What is lemmy doing about bot scrapers?

What is lemmy doing about bot scrapers?

The great scrape

Aggressive bots ruined my weekend