I use Zip Bombs to Protect my Server

some_guy@lemmy.sdf.org · 2 months ago

I use Zip Bombs to Protect my Server

fmstrat@lemmy.nowsci.com · 2 months ago

I’ve been thinking about making an nginx plugin that randomizes words on a page to poison AI scrapers.

some_guy@lemmy.sdf.org · 2 months ago

If you have the time, I think it’s a great idea.

owsei@programming.dev · 2 months ago

There are “AI mazes” that do that.

I remember reading and article about this but haven’t found it yet

corsicanguppy@lemmy.ca · 2 months ago

The one below, named Anubis. Is the one I heard about. Come back to the thread and check the link.

delusion@lemmy.myserv.one · 2 months ago

https://zadzmo.org/code/nepenthes/

fmstrat@lemmy.nowsci.com · 2 months ago

That is a very interesting git repo. Is this just a web view into the actual git folder?

UnbrokenTaco@lemm.ee · 2 months ago

Interesting. I wonder how long it takes until most bots adapt to this type of “reverse DoS”.

sugar_in_your_tea@sh.itjust.works · 2 months ago

Then we’ll just be more clever as well. It’s an arms race after all.

palordrolap@fedia.io · 2 months ago

The article writer kind of complains that they’re having to serve a 10MB file, which is the result of the gzip compression. If that’s a problem, they could switch to bzip2. It’s available pretty much everywhere that gzip is available and it packs the 10GB down to 7506 bytes.

That’s not a typo. bzip2 is way better with highly redundant data.

Xanza@lemm.ee · edit-2 2 months ago

zstd is a significantly better option than anything else available unless you need something specific for a specific reason: https://github.com/facebook/zstd?tab=readme-ov-file#benchmarks

LZ4 is likely better than zstd, but it doesn’t have wide usability yet.

palordrolap@fedia.io · 2 months ago

You might be thinking of lzip rather than lz4. Both compress, but the former is meant for high compression whereas the latter is meant for speed. Neither are particularly good at dealing with highly redundant data though, if my testing is anything to go by.

Either way, none of those are installed as standard in my distro. xz (which is lzma based) is installed as standard but, like lzip, is slow, and zstd is still pretty new to some distros, so the recipient could conceivably not have that installed either.

bzip2 is ancient and almost always available at this point, which is why I figured it would be the best option to stand in for gzip.

As it turns out, the question was one of data streams not files, and as at least one other person pointed out, brotli is often available for streams where bzip2 isn’t. That’s also not installed by default as a command line tool, but it may well be that the recipient, while attempting to emulate a browser, might have actually installed it.

some_guy@lemmy.sdf.org · 2 months ago

TIL why I’m gonna start learning more about bzip2. Thanks!

sugar_in_your_tea@sh.itjust.works · 2 months ago

Brotli gets it to 8.3K, and is supported in most browsers, so there’s a chance scrapers also support it.

Aceticon@lemmy.dbzer0.com · 2 months ago

Gzip encoding has been part of the HTTP protocol for a long time and every server-side HTTP library out there supports it, and phishing/scrapper bots will be done with server-side libraries, not using browser engines.

Further, judging by the guy’s example in his article he’s not using gzip with maximum compression when generating the zip bomb files: he needs to add -9 to the gzip command line to get the best compression (but it will be slower). (I tested this and it made no difference at all).

sugar_in_your_tea@sh.itjust.works · edit-2 2 months ago

You can make multiple files with different encodings and select based on the Accept-Encoding header.

just_another_person@lemmy.world · 2 months ago

I believe he’s returning a gzip HTTP reaponse stream, not just a file payload that the requester then downloads and decompresses.

Bzip isn’t used in HTTP compression.

sugar_in_your_tea@sh.itjust.works · 2 months ago

Brotli is an option, and it’s comparable to Bzip.

bss03@infosec.pub · 2 months ago

For scrapers that not just implementing HTTP, but are trying to extract zip files, you can possibly drive them insane with zip quines: https://github.com/ruvmello/zip-quine-generator or otherwise compressed files that contain themselves at some level of nesting, possibly with other data so that they recursively expand to an unbounded (“infinite”) size.

comador @lemmy.world · edit-2 2 months ago

Funny part is many of us crusty old sysadmins were using derivatives of this decades ago to test RAID-5/6 sequencial reads and write speeds.

mbirth@lemmy.ml · 2 months ago

And if you want some customisation, e.g. some repeating string over and over, you can use something like this:

yes "b0M" | tr -d '\n' | head -c 10G | gzip -c > 10GB.gz

yes repeats the given string (followed by a line feed) indefinitely - originally meant to type “yes” + ENTER into prompts. tr then removes the line breaks again and head makes sure to only take 10GB and not have it run indefinitely.

If you want to be really fancy, you can even add some HTML header and footer to some files like header and footer and then run it like this:

yes "b0M" | tr -d '\n' | head -c 10G | cat header - footer | gzip -c > 10GB.gz

tal@lemmy.today · 2 months ago

Anyone who writes a spider that’s going to inspect all the content out there is already going to have to have dealt with this, along with about a bazillion other kinds of oddball or bad data.

lennivelkant@discuss.tchncs.de · 2 months ago

That’s the usual case with arms races: Unless you are yourself a major power, odds are you’ll never be able to fully stand up to one (at least not on your own, but let’s not stretch the metaphor too far). Often, the best you can do is to deterr other, minor powers and hope major ones never have a serious intent to bring you down.

In this specific case, the number of potential minor “attackers” and the hurdle for “attack” mKe it attractive to try to overwhelm the amateurs at least. You’ll never get the pros, you just hope they don’t bother you too much.

catloaf@lemm.ee · 2 months ago

Competent ones, yes. Most developers aren’t competent, scraper writers even less so.

idriss@lemm.ee · 2 months ago

That’s true. Scrapping is a gold mine for the people that don’t know. I worked for a place which crawls the internet and beyond (fetches some internal dumps we pay for). There is no chance a zip bomb would crash the workers as there are strict timeouts and smell tests (even if a does it will crash an ECS task at worst and we will be alerted to fix that within a short time). We were as honest as it gets though, following GDPR, honoring the robots file, no spiders or scanners allowed, only home page to extract some insights.

I am aware of some big name EU non-software companies very interested in keeping an eye on some key things that are only possible with scraping.

cy_narrator@discuss.tchncs.de · 2 months ago

First off, be very careful with bs=1G as it may overload the RAM. You will want to set count accordingly

sugar_in_your_tea@sh.itjust.works · 2 months ago

Yup, use something sensible like 10M or so.

cy_narrator@discuss.tchncs.de · edit-2 2 months ago

I would normally go much lower,

bs=4K count=262144 which creates 1G with 4K block size

👍Maximum Derek👍@discuss.tchncs.de · 2 months ago

When I was serving high volume sites (that were targeted by scrapers) I had a collection of files in CDN that contained nothing but the word “no” over and over. Scrapers who barely hit our detection thresholds saw all their requests go to the 50M version. Super aggressive scrapers got the 10G version. And the scripts that just wouldn’t stop got the 50G version.

It didn’t move the needle on budget, but hopefully it cost them.

sugar_in_your_tea@sh.itjust.works · 2 months ago

How do you tell scrapers from regular traffic?

👍Maximum Derek👍@discuss.tchncs.de · 2 months ago

Most often because they don’t download any of the css of external js files from the pages they scrape. But there are a lot of other patterns you can detect once you have their traffic logs loaded in a time series database. I used an ELK stack back in the day.

sugar_in_your_tea@sh.itjust.works · 2 months ago

That sounds like a lot of effort. Are there any tools that get like 80% of the way there? Like something I could plug into Caddy, nginx, or haproxy?

👍Maximum Derek👍@discuss.tchncs.de · 2 months ago

My experience is with systems that handle nearly 1000 pageviews per second. We did use a spread of haproxy servers to handle routing and SNI, but they were being fed offender lists by external analysis tools (built in-house).

sugar_in_your_tea@sh.itjust.works · 2 months ago

Dang, I was hoping for a FOSS project that would do most of the heavy lifting for me. Maybe such a thing exists, idk, but it would be pretty cool to have a pluggable system that analyzes activity and tags connections w/ some kind of identifier so I could configure a web server to either send it nonsense (i.e. poison AI scrapers), zip bombs (i.e. bots that aren’t respectful of resources), or redirect to a honey pot (i.e. malicious actors).

A quick search didn’t yield anything immediately, but I wasn’t that thorough. I’d be interested if anyone knows of such a project that’s pretty easy to play with.

A Basil Plant@lemmy.world · edit-2 2 months ago

Not exactly what you asked, but do you know about ufw-blocklist?

I’ve been using this on my multiple VPSes for some time now and the number of fail2ban failed/banned has gone down like crazy. Previously, I had 20k failed attempts after a few months and 30-50 currently-banned IPs at all times; now it’s less than 1k failed after a year and maybe 3-ish banned at any time.

There was also that paid service where users share their spammy IP address attempts with a centralized network, which does some dynamic intelligence monitoring. I forgot the name and search these days isn’t great. Something to do with “Sense”? It was paid, but well recommended as far as I remember.

Edit: seems like the keyword is " threat intelligence platform"

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · edit-2 2 months ago

How I read that code:

“If the dev folder’s bullshit is equal to 1 gram…”

deaddigger@lemm.ee · 2 months ago

At least in germany having one of these on your system is illegal

Dzso@lemmy.world · 2 months ago

Out of curiosity, what is illegal about it, exactly?

deaddigger@lemm.ee · edit-2 2 months ago

I mean i am not a lawyer.

In germany we have § 303 b StGB. In short it says if you hinder someone elses dataprocessing through physical means or malicous data you can go to jail for up to 3 years . If it is a major process for someone you can get up to 5 and in major cases up to 10 years.

So if you have a zipbomb on your system and a crawler reads and unpacks it you did two crimes. 1. You hindered that crawlers dataprocessing 2. Some isp nodes look into it and can crash too. If the isp is pissed of enough you can go to jail for 5 years. This applies even if you didnt crash them die to them having protection agsinst it, because trying it is also against the law.

Having a zipbomb is part of a gray area. Because trying to disrupt dataprocessing is illegal, having a zipbomb can be considered trying, however i am not aware of any judgement in this regard

Edit: btw if you password protect your zipbomb, everything is fine

barsoap@lemm.ee · edit-2 2 months ago

Severely disrupting other people’s data processing of significant import to them. By submitting malicious data requires intent to cause harm, physical destruction, deletion, etc, doesn’t. This is about crashing people’s payroll systems, ddosing, etc. Not burning some cpu cycles and having a crawler subprocess crash with OOM.

Why the hell would an ISP have a look at this. And even if, they’re professional enough to detect zip bombs. Which btw is why this whole thing is pointless anyway: If you class requests as malicious, just don’t serve them. If that’s not enough it’s much more sensible to go the anubis route and demand proof of work as that catches crawlers which come from a gazillion IPs with different user agents etc.

deaddigger@lemm.ee · edit-2 2 months ago

Telecom for example does Deep PackageInspection. That is rather well kown. Derec made a statement years ago that it is normal for other european isps too. Here is a secondary source for it, i cant find the primary source anymore https://netzpolitik.org/2012/berec-studie-dpi-bei-vielen-providern-bereits-im-einsatz/

If you are succesful in disrupting some dataprocessing doesnt matter, trying to do it is illigal. If you put it there to disrupt crawlers you are trying to disrupt an entities dataprocessing.

If your isp does dpi an archive bomb is able to crash their server. Even if they have measures againt it, it is still illigal because trying it is illigal.

barsoap@lemm.ee · 2 months ago

The intent is to get rid of crawlers which are disrupting the operation of your servers. That’s not intent of doing harm to the crawler’s operator, or their business. It’s analogous to telling a hawker to fuck off: Polite, no, but them being able to profit off you is not your responsibly, you do not have to accede to that. And intent to harm the ISP is even less reasonable to assume.

cant find the primary source anymore https://netzpolitik.org/2012/berec-studie-dpi-bei-vielen-providern-bereits-im-einsatz/

That’s out of date anyway. How about this one. DPI is limited to OSI level 5 and only allowed to resolve network issues – and a crawler crashing is not a network issue.

deaddigger@lemm.ee · 2 months ago

That’s out of date anyway. How about this one.

Good to know

A crawler is a data processing machine, nothing more. therefor you are disrupting dataprocessing through data. If you think its not thats ok too. I would still advise to contact your lawyer in germany if you are thinking about hosting a zipbomb

barsoap@lemm.ee · 2 months ago

A crawler is a data processing machine, nothing more. therefor you are disrupting dataprocessing through data. If you think its not thats ok too.

Nah it’s definitely disrupting data processing, even though at a very low-key level – you’re not causing any data to become invalid or such. It’s the intent to harm the operator that’s the linchpin: “Jemandem einen Nachteil zufügen”. “Jemand” needs to be a person, natural or legal. And by stopping a crawler you don’t want to inflict a disadvantage on the operator you want to, at most, stop them from gaining an advantage. “Inflict disadvantage” and “prevent advantage” are two different things.

I would still advise to contact your lawyer in germany if you are thinking about hosting a zipbomb

Good idea, but as already said before: First, you should contact a sysadmin. Who will tell you it’s a stupid idea.

Raltoid@lemmy.world · edit-2 2 months ago

TL;DR: It’s illegal to have publically available or share.

Making it illegal to create one for research purposes on your own hardware is not illegal as far as I know. And if it is, I wouldn’t mind seeing someone challenge that with the EU.

MimicJar@lemmy.world · 2 months ago

I wonder if having a robots.txt file that said to ignore the file/path would help.

I’m assuming a bad bot would ignore the robots.txt file. So you could argue that you put up a clear sign and they chose to ignore it.

deaddigger@lemm.ee · edit-2 2 months ago

Good question i dont know tbh. Would be an interesting question for a lawyer influencer

Lka1988@lemmy.dbzer0.com · 2 months ago

Maybe bots shouldn’t be trying to install malicious code? Sucks to suck.

lennivelkant@discuss.tchncs.de · 2 months ago

Still illegal. Not immoral, but a lot of our laws aren’t built on morality.

Raltoid@lemmy.world · 2 months ago

Illegal to publically serve or distribute.

aesthelete@lemmy.world · edit-2 2 months ago

This reminds me of shitty FTP sites with ratios when I was on dial-up. I used to push them files full of null characters with filenames that looked like actual content. The modem would compress the upload as it transmitted it which allowed me to upload the junk files at several times the rate of a normal file.

MeThisGuy@feddit.nl · edit-2 2 months ago

that is pretty darn clever

I use a torrent client that will lie on the upload (x10 or x11, or a myriad of other options) so as to satisfy the upload ratio requirement of many members only torrent communities

lemmylommy@lemmy.world · 2 months ago

Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device.

LOL. Destroy your device, kill the cat, what else?

archonet@lemy.lol · 2 months ago

destroy your device by… having to reboot it. the horror! The pain! The financial loss of downtime!

Albbi@lemmy.ca · 2 months ago

It’ll email your grandmother all if your porn!

Bobby Turkalino@lemmy.yachts · 2 months ago

Haven’t thought about that Weird Al song in a while

Dizzy Devil Ducky@lemm.ee · 2 months ago

The horrors of having your TV record Gigli!

Exec@pawb.social · 2 months ago

outstanding reference

CrazyLikeGollum@lemmy.world · 2 months ago

Ah yes, the infamous “stinky cheese” email virus. Who knew zip bombs could be so destructive. It erased all of the easter eggs off of my DVDs.

Aatube@kbin.melroy.org · 2 months ago

macOS compresses its memory. Does this mean we’ll see bots running on macOS now?

UnbrokenTaco@lemm.ee · 2 months ago

Is it immune to zip bombs?

Aatube@kbin.melroy.org · 2 months ago

All I know is it compresses memory. The mechanism mentioned here for ZIP bombs to crash bots is to fill up memory fast with repeating zeroes.

Guidy@lemmy.world · 2 months ago

I thought it was to fill all available storage. Maybe it’s both?

DreamButt@lemmy.world · 2 months ago

No, but that’s an interesting question. Ultimately it probably comes down to hardware specs. Or depending on the particular bot and it’s env the spec of the container it’s running in

Even with macos’s style of compressing inactive memory pages you’ll still have a hard cap that can be reached with the same technique (just with a larger uncompressed file)

4am@lemm.ee · 2 months ago

How long would it take to be considered an inactive memory page? Does OOM conditions immediately trigger compression, or would the process die first?

ivn@jlai.lu · edit-2 2 months ago

Linux and Windows compress it too, for 10 years or more. And that’s not how you avoid zip bombs, just limit how much you uncompress and abort if it’s over that limit.

timetraveller@lemmy.world · 2 months ago

I was going to say the same thing.

dwt@feddit.org · 2 months ago

Sadly about the only thing that reliably helps against malicious crawlers is Anubis

https://anubis.techaro.lol/

LainTrain@lemmy.dbzer0.com · 2 months ago

Neat

alehel@lemmy.zip · 2 months ago

That URL is telling me “Invalid response”. Am I a bot?

MonkderVierte@lemmy.ml · 2 months ago

You’re using a VPN, right?

alehel@lemmy.zip · 2 months ago

Nope. Just using Vivaldi on my Android device.

Squizzy@lemmy.world · 2 months ago

Im not and it gave an invalid response. I am just chilling on my home wifi.

sugar_in_your_tea@sh.itjust.works · 2 months ago

Probably.

doorknob88@lemmy.world · 2 months ago

I’m sorry you had to find out this way.

xavier666@lemm.ee · 2 months ago

Now you know why your mom spent so much time with the Amiga

L_Acacia@lemmy.ml · 2 months ago

https://anubis.techaro.lol/docs/user/known-broken-extensions

If you have JShelter installed, it breaks the proof of work from anubis

spicehoarder@lemm.ee · 2 months ago

I don’t really like this approach, not just because I was flagged as a bot, but because I don’t really like captchas. I swear I’m not a bot guys!

Treczoks@lemmy.world · 2 months ago

Have you ever heard of sparse files, and how Linux and Windows deal with zips of it? You’ll love this.