booting system... standby
This is my simple blog. My intention is to ramble about things that amuse me. One day it might develop some structure, until then…
- 14 Sep 2014 » Monitoring systemd and failing services
- 19 Aug 2014 » Linux + Radeon HD 4200 + Reduced Blanking
- 28 Jul 2014 » Quick Benchmarks of dm-cache / lvmcache
- 25 Jul 2014 » Simple and Fast Random Data Generator
- 10 Jul 2014 » Use Funtoo's Keychain Instead of GNOME Keyring
- 30 Jun 2014 » Flash Crashes When Network Interfaces Change
- 28 Apr 2014 » Poor pv and splice() Performance
- 24 Apr 2014 » Ubuntu 14.04 LTS on OpenVZ + ufw and iptables Firewall
- 21 Apr 2014 » Two Factor (2FA) SSH Authentication Using YubiKey
- 31 Mar 2014 » Pragmatic Backups
- 20 Feb 2014 » SSH Reverse Tunnel on Linux with systemd
- 07 Jan 2014 » Opt-out of Junk Snail Mail
- 30 Dec 2013 » Time Warner Cable aka RoadRunner TLS and SSL Mail Fail
- 14 Dec 2013 » Long Range Zip Musings
- 29 Sep 2013 » Using Native IPv6 via Comcast in San Francisco
- 26 Aug 2013 » Where have I been?
- 14 Jul 2013 » FSSH part 2: Tmux and Vim
- 07 Jul 2013 » Dying Gigabyte Motherboard
- 30 Jun 2013 » SSD Caching Using dm-cache Tutorial
- 20 Jun 2013 » SSH Reverse Tunnel on Mac OS X
- 17 Jun 2013 » Ubuntu 13.04 Bandwidth Shaping and Traffic Control using HTB
- 16 Jun 2013 » Leveraging Upstart for User Jobs
- 15 Jun 2013 » Remote ssh copy paste buffers using fssh
- 09 Jun 2013 » Use imapfilter to filter SPAM - part 2
- 02 Jun 2013 » Android CA Certificates
- 01 Jun 2013 » Parse eMMC Extended CSD (ECSD) Registers with Python
- 30 May 2013 » Manage LXCs with Docker
- 28 May 2013 » Ting
- 27 May 2013 » SSD let me down again
- 27 May 2013 » BitTorrent Sync
- 26 May 2013 » Use imapfilter to filter SPAM - part 1
- 13 May 2013 » GNOME Keyring Access for Python
- 12 May 2013 » Lua popen3() Implementation
- 12 May 2013 » Btrfs filesystem trips up
- 09 May 2013 » Linux SSD caching part 2
- 08 May 2013 » Epson WorkForce WF-3520 + Ubuntu 13.04
- 06 May 2013 » GNOME Keyring Daemon Breaks My GPG Encrypted Backups
- 05 May 2013 » Issue with my SSD + btrfs + discard
- 26 Apr 2013 » Issues with Ubuntu's UFW on OpenVZ VPS
- 20 Apr 2013 » Linux SSD caching
- 10 Apr 2013 » My Wi-Fi access point revisited
- 01 Jan 2013 » New job, moving cross country
- 06 Sep 2012 » Ubuntu 12.04 LTS Minimal GUI
- 05 Sep 2012 » The smoking gun
- 29 Aug 2012 » A story about a car...
- 07 Aug 2012 » Managing /etc with etckeeper
- 06 Aug 2012 » Hello World
Monitoring systemd and failing services
September 14, 2014
No Emails on Failed Tasks?
I’ve been slowly working to conver my cron jobs on my workstation to run as systemd timers. For the most part it has been going awesome, systemd is so powerful and many things are no long relevant (i.e. cron script collisions and file locks).
On the negative side, cron used to send me emails when tasks would fail and return a non-zero status code. This was critical for things like backups that I don’t want to silently fail forever. What can systemd do about this? Not much by default it seems.
At first I discovered the
OnFailure option for unit section of systemd service files. It would start up a service on a failure, but unfortuntately doesn’t convey much state to the failure service. I used an instance name to pass on to the failure service.
The major downside is that this would need to be configured for every service I want to monitor. I lost interest, mostly. I dumped what I had on github, people can find it @ kylemanna/systemd-utils/onfailure.
Manual Parsing journalctl
At first this seems hacky due to the absurd about of string parsing, but until I find something better, this will work for now. I wrote a service called
failure-monitor and it’s also on github @ kylemanna/systemd-utils/failure-monitor.
The service consists of two parts: a python file that does the work and a systemd service file to run the python script. The python script fires up and follows the journalctl log file looking for “entered failed state”. When the magic string is encountered it parses some things and sends an email. Simple as that. Service startup is managed by systemd and works as most systemd services. The instance name is used as a hacky way to provide the destination email address in a configurable manner.
The service makes alot of naive assumptions like there is a local mail server running (postfix in my case) and it just works.
Hopefully other people can chime in and help improve these services. Maybe systemd will get a active response system for failure. It currently has a method to upload logfiles to servers, but that seems overkill for my workstation.