What do y’all use to monitor many linux servers?

shootwhatsmyname@lemm.ee · 12 days ago

What do y’all use to monitor many linux servers?

Mora@pawb.social · 11 days ago

Beszel. Probably the easiest tool of all the mentioned in this thread.

dan@upvote.au · 4 days ago

I’m working on making it easier to install on Debian systems by creating a Debian package (and eventually a repo): https://github.com/henrygd/beszel/pull/497

JustARegularNerd@aussie.zone · 11 days ago

Seconded. My only complaint (which this might already be a feature I haven’t found yet) is it doesn’t seem to support multiple drives. But yes, it is shit easy to set up and has a beautiful UI

Mora@pawb.social · 11 days ago

Totally possible:

https://beszel.dev/guide/additional-disks

JustARegularNerd@aussie.zone · 11 days ago

I no longer have any complaints about Beszel. Thank you!

reisub@discuss.tchncs.de · 12 days ago

Node exporter, Prometheus and grafana

dann [any]@hexbear.net · 11 days ago

This

static09@lemmy.world · 12 days ago

Check out Netdata or Zabbix.

Ozymandias1688@feddit.org · 12 days ago

Serverbox. https://github.com/lollipopkit/flutter_server_box

CaptSpify@lemmy.today · 12 days ago

https://en.m.wikipedia.org/wiki/Nagios

iii@mander.xyz · edit-2 12 days ago

uptime-kuma is what I use

loganb@lemmy.world · 12 days ago

I personally use CheckMK.

Offer a free “Raw” version.
Can be deployed with docker.
OSS

One thing is that it can be a lot to take in at first and took me a while to get used to it.

corsicanguppy@lemmy.ca · 11 days ago

CheckMk user here via omd.

I’m looking for something else after the upgrade.

Black interface isn’t pretty for me and the old interface was “meh too hard so we ditched it”.
One half of the project split has a shit supply chain and just doesn’t meet the bar for upgrade requirements.
The other half of the project split is a mess to config in an automated desired-state setup. It’s all edge-triggered manual bullshit. NO. ENOUGH.

I miss 1.2 .

notabot@lemm.ee · 12 days ago

Nagios. It does depend on what you mean by monitor though. Nagios is good at telling you that “service A on host B” is down" but less useful for looking at things like performance trends. I particularly like being able to setup dependencies between services, so I get the alert for the root cause, and not all of the services that have gone down because of it.

sgh@lemmy.ml · 12 days ago

While I use LibreNMS as it uses SNMP for monitoring (which is pretty much available everywhere), I don’t believe it has http alerts, but I know for a fact that it can send Telegram messages.

ocean@lemmy.selfhostcat.com · 12 days ago

I just see if it works when I need it. If I’m at home it works. If I’m at work it may work. If I’ve left to travel it’s 95% definitely down and cannot be fixed. This works well!

Cysioland@lemmygrad.ml · 12 days ago

Zabbix

LainTrain@lemmy.dbzer0.com · 12 days ago

Cockpit.

hobbsc@lemmy.sdf.org · 11 days ago

is cockpit on a server by server basis or can you monitor multiple servers with it?

cmc@discuss.tchncs.de · 10 days ago

You can monitor multiple machines via the host switcher menu at the top-left of the screen: Multiple Machines

corsicanguppy@lemmy.ca · 11 days ago

My cockpit experience has been unilaterally dreadful. I’m glad you’re getting value out of it.

dkc@lemmy.world · 12 days ago

I’ve been really enjoying Cockpit as well.

hindy@mbin.lovetux.net · 12 days ago

Hello,

I’m still using Nagios here. And for the availability of the services I’m using uptime-kuma (in a docker).

spicehoarder@lemm.ee · 11 days ago

Not exactly what you’re looking for, but I like using proxmox

RegalPotoo@lemmy.world · 12 days ago

Base ansible role installs Prometheus node exporter, configured with the text file collector
VM automations push DNS records so that the Prometheus dns-sd automatically discovers them
Ansible roles for add Cron jobs that generate metrics for specific systems and dump them for the text file collector
Grafana for dashboards
Karma as a UI in front of Prometheus alert manager

tetris11@lemmy.ml · 12 days ago

Cron jobs that generate metrics for specific systems and dump them for the text file collector

Details please

RegalPotoo@lemmy.world · 11 days ago

https://github.com/prometheus/node_exporter?tab=readme-ov-file#textfile-collector - which makes node exporter watch a specific directory for files that contain metrics, then re-export them back to the central Prometheus server
Some systems have their own metrics endpoints - instead of getting Prometheus to scrape these directly I set up a Cron job to curl these into files for node exporter - this means I don’t need extra config in Prometheus to find the endpoints, and don’t need to mess with firewall rules
Other systems don’t directly expose metrics in a format Prometheus can use - in this case I will write/find a script that can do the conversation, then either set it up to write the metrics file directly and run it on a Cron, or run it as a service and another Cron job to do the scrape