We were recently faced with a service that crashed occasionally on one of our Linux servers. We had to find a way to make it recover automatically, ideally using tools that were already present on the server.
The landscape of process-management in Linux is somewhat complex and is in a state of
flux, with different tools vying to replace the venerable SysV-style
init that sysadmins have relied on for decades.
Luckily, it looks like the highly-capable
systemd is becoming the standard, and we’ll
hopefully have another long stable period soon.
In the meantime, we needed to get our service running reliably. In our case, the
delayed_job service that supported one of our Rails apps was stopping occasionally. The
service was controlled by a SysV-style init script in
/etc/init.d, and it was started
automatically when the system started up. Unfortunately, SysV-style init doesn’t include
monitoring or auto-restart capability*.
* You can actually used the
respawn feature in the
/etc/inittab file, but that
looked too scary to us!
systemd is becoming a standard (present on Ubuntu 14.10+, RedHat, 7+, CentOS
7+, and Fedora 15+), we wanted to learn how to use it.
systemd is controlled by
.service files in a variety of locations. However, we wanted to avoid rewriting the
existing startup script. Our research revealed the following:
systemdis compatible with legacy
/etc/init.dscripts in this manner: when
systemdloads service definitions, the
.servicefiles on the fly from the scripts in
- You can add configuration to a service by adding “drop-in” files to a correctly-named
So, we simply had to add a file like this at
[Service] Type=forking PIDFile=/srv/www/sites/rails_app/current/tmp/pids/delayed_job.pid RemainAfterExit=no Restart=on-failure RestartSec=5s
This configuration will vary for different services. You’ll have to read the
docs to figure out
the specifics. In particular, you should read the docs for the
Then, after reloading our service definitions with
systemd daemon-reload, we can see
systemd knew what it needed to know about our service.
$ sudo systemctl status delayed_job ● delayed_job.service - LSB: Manage delayed jobs for application rails_app Loaded: loaded (/etc/init.d/delayed_job; bad; vendor preset: enabled) Drop-In: /etc/systemd/system/delayed_job.service.d └─restart.conf Active: active (running) since Mon 2017-10-23 15:33:25 CDT; 12min ago Docs: man:systemd-sysv-generator(8) Process: 6359 ExecStop=/etc/init.d/delayed_job stop (code=exited, status=0/SUCCESS) Process: 6389 ExecStart=/etc/init.d/delayed_job start (code=exited, status=0/SUCCESS) Main PID: 6412 (bundle) CGroup: /system.slice/delayed_job.service ‣ 6412 delayed_job
Notice the “Drop-In” section, which tells us that
systemd knows about the new config
file we added, and the
Main PID line, which indicates that
systemd knows which process
We can test this config by force-killing the service in question:
sudo kill -9 6412.
system status delayed_job immediately will show something like
deactivating (stop) (Result: signal). Within 5 seconds (per the config), It should return
Active: active (running).
If you’re using an older distro, or one that doesn’t have
systemd for any reason, check
out Digital Ocean’s exhaustive guide to automatic service recovery.