Arthur Besse@lemmy.ml to

Not the Onion@lemmy.mlEnglish · 29 days ago

Cloudflare's next-generation "AI Labyrinth" promises to "waste resources" as-a-service, using today's machine learning models to sabotage tomorrow's

blog.cloudflare.com

1

1

Cloudflare's next-generation "AI Labyrinth" promises to "waste resources" as-a-service, using today's machine learning models to sabotage tomorrow's

blog.cloudflare.com

Arthur Besse@lemmy.ml to

Not the Onion@lemmy.mlEnglish · 29 days ago

1

Trapping misbehaving bots in an AI Labyrinth

blog.cloudflare.com

How Cloudflare uses generative AI to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives.

Today, we’re excited to announce AI Labyrinth, a new mitigation approach that uses AI-generated content to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives. When you opt in, Cloudflare will automatically deploy an AI-generated set of linked pages when we detect inappropriate bot activity, without the need for customers to create any custom rules.

And it’s “free”! (visibility in to all of that traffic is more than sufficient payment for them 🤑)

Here are some perhaps-contradictory highlights from their blog post (emphasis mine), which I’m pretty sure was itself written with LLM assistance:

No real human would go four links deep into a maze of AI-generated nonsense.

When these links are followed, we know with high confidence that it’s automated crawler activity, as human visitors and legitimate browsers would never see or click them. This provides us with a powerful identification mechanism, generating valuable data that feeds into our machine learning models. By analyzing which crawlers are following these hidden pathways, we can identify new bot patterns and signatures that might otherwise go undetected.

But as bots have evolved, they now proactively look for honeypot techniques like hidden links, making this approach less effective.

AI Labyrinth won’t simply add invisible links, but will eventually create whole networks of linked URLs that are much more realistic, and not trivial for automated programs to spot. The content on the pages is obviously content no human would spend time-consuming, but AI bots are programmed to crawl rather deeply to harvest as much data as possible. When bots hit these URLs, we can be confident they aren’t actual humans, and this information is recorded and automatically fed to our machine learning models to help improve our bot identification. This creates a beneficial feedback loop where each scraping attempt helps protect all Cloudflare customers.

This is only the first iteration of using generative AI to thwart bots for us. Currently, while the content we generate is convincingly human, it won’t conform to the existing structure of every website. In the future, we’ll continue to work to make these links harder to spot and make them fit seamlessly into the existing structure of the website they’re embedded in. You can help us by opting in now.

You must log in or register to comment.

Chat

Sonori@beehaw.org
link
fedilink
arrow-up
0·
28 days ago
We may not live in the worst of all possible worlds, but we sure do live in the dumbest.