Site Administration¶
Node Supervision¶
Guardian daemon processes run on dedicated machines in the CDS front-end networks: {h,l}1guardian0. They are managed by a process supervision system called [[http://smarden.org/runit/|runit]] ([[http://packages.ubuntu.com/precise/admin/runit|Ubuntu package]]). Runit handles stoping, starting, and logging of the guardian daemons.
Runit manages individual daemon processes with a program called ‘runsv’ and its sister ‘svlogd’ that handles logging. Each runsv process supervises an individual guardian daemon. The overall pool of runsv-supervised guardian daemons is itself supervised by the runsvdir program. At the very top is the Ubuntu Upstart init daemon (process 1).
Here’s an example process tree, with a single active guardian process (‘SUS_SR2’):
init # Upstart init daemon (process 1)
+-runsvdir -P /etc/guardian/service # runit main guardian runsvdir process
+-runsv SUS_SR2 # runit individual daemon supervisor
+-guardian SUS_SR2 # guardian main process
| +-guardian SUS_SR2 (worker) # guardian worker subprocess
| | +-{python} # guardian auxilliary threads
| | +-{python}
| | +-{python}
| | +-{python}
| +-{python}
| +-{python}
| +-{python}
+-svlogd -tt /var/log/guardian/SUS_SR2 # svlogd logging daemon
Interaction with this infrastucture is done via the guardctrl command line utility.
configuration directories and files¶
The main configuration directly on the guardian machine is
/etc/guardian
:
controls@h1guardian0:~ 0$ ls -al /etc/guardian
total 24
drwxr-xr-x 4 root root 4096 Feb 28 18:28 .
drwxr-xr-x 104 root root 4096 Feb 22 15:57 ..
-rw-r--r-- 1 cdsadmin cdsadmin 441 Feb 28 18:27 local-env
lrwxrwxrwx 1 root root 18 Feb 20 15:34 logs -> /var/log/guardian/
drwxr-xr-x 41 controls controls 4096 Mar 28 16:18 nodes
-rwxr-xr-x 1 cdsadmin cdsadmin 141 Feb 18 15:37 runsvdir-start
drwxr-xr-x 2 controls controls 4096 Mar 28 16:18 service
lrwxrwxrwx 1 root root 13 Feb 20 15:33 supervise -> /run/guardian
Each node is given an unique configuration directory in
/etc/guardian/nodes
. Each node directory describes how the
guardian daemon should be started and logged:
controls@h1guardian0:~ 0$ find /etc/guardian/nodes/SUS_SR2
/etc/guardian/nodes/SUS_SR2/
/etc/guardian/nodes/SUS_SR2/env
/etc/guardian/nodes/SUS_SR2/supervise
/etc/guardian/nodes/SUS_SR2/log
/etc/guardian/nodes/SUS_SR2/log/supervise
/etc/guardian/nodes/SUS_SR2/log/config
/etc/guardian/nodes/SUS_SR2/log/main
/etc/guardian/nodes/SUS_SR2/log/run
/etc/guardian/nodes/SUS_SR2/run
/etc/guardian/nodes/SUS_SR2/guardian
/etc/guardian/nodes/SUS_SR2/finish
The existence of this directory itself does not initiate the service.
To initiate the service the node directory has to be linked into the
“service” directory /etc/guardian/service
so that it can be
instantiated by the main runsvdir supervisor.
The supervise directory, /etc/guardian/supervise
, holds
runit-specific supervision files for each node (fifos, sockets, etc.).
It should be a link to a tmpfs, in this case mounted at
/run/guardian
.
The logs are stored in /var/log/guardian
.
When first creating the infrastructure, it’s important to create the
needed extra directories (/run/guardian
, /var/log/guardian
)
with the correct permissions, and then link them into
/etc/guardian
. The [[Guardian/guardctrl|guardctrl]] utility
should then create the necessary subdirectories as needed.
guardian run environment¶
The run script in the node directory is what actually starts the
guardian process. The guardian run scrips source the main environment
file, /etc/guardian/local-env
, before execution. This file should
be configured with any local environmental settings that are necessary
for guardian to function properly:
export PATH=/bin:/usr/bin
export LD_LIBRARY_PATH=
export PYTHONPATH=
. /ligo/apps/linux-x86_64/epics/etc/epics-user-env.sh
. /ligo/apps/linux-x86_64/nds2-client/etc/nds2-client-user-env.sh || true
. /ligo/apps/linux-x86_64/cdsutils/etc/cdsutils-user-env.sh
. /ligo/apps/linux-x86_64/guardian/etc/guardian-user-env.sh
ifo=`ls /ligo/cdscfg/ifo`
site=`ls /ligo/cdscfg/site`
export IFO=${ifo^^*}
export NDSSERVER=${ifo}nds0:8088,${ifo}nds1:8088
upstart init¶
The main service pool manager (runsvdir) is manged by upstart (Ubuntu’s init daemon). The service is called guardian-runsvdir, and uses the following upstart ‘init’ file that specifies when it should be started and stopped, and any pre/post run conditions:
/etc/init/guardian-runsvdir.conf
:
start on runlevel [2345]
stop on shutdown
respawn
pre-start script
mkdir -p /run/guardian
mountpoint -q /run/guardian || mount -t tmpfs -o size=10M,user=controls tmpfs /run/guardian
end script
exec /etc/guardian/runsvdir-start
kill signal HUP
Note the service depends on there being a tempfs mounted at
/run/guardian
.
The init script actually executes the /etc/guardian/runsvdir-start
script that sets up the environment for all guardian processes and
then execs the runsvdir process:
/etc/guardian/runsvdir-start
:
#!/bin/bash
PATH=/bin:/usr/bin
srvdir=/etc/guardian/service
exec env - \
PATH=$PATH \
chpst -u controls /usr/bin/runsvdir -P $srvdir
Upstart processes are controlled with the initctl utility. This can be used to start, stop, and check status of guardian-runsvdir. Any user can access status of an upstart-supervised daemon, but you must be root to start/stop the process:
controls@h1guardian0:~ 0$ initctl status guardian-runsvdir
guardian-runsvdir start/running, process 23769
controls@h1guardian0:~ 0$