Site Administration =================== Node Supervision ---------------- Guardian daemon processes run on dedicated machines in the CDS front-end networks: {h,l}1guardian0. They are managed by a process supervision system called [[http://smarden.org/runit/|runit]] ([[http://packages.ubuntu.com/precise/admin/runit|Ubuntu package]]). Runit handles stoping, starting, and logging of the guardian daemons. Runit manages individual daemon processes with a program called 'runsv' and its sister 'svlogd' that handles logging. Each runsv process supervises an individual guardian daemon. The overall pool of runsv-supervised guardian daemons is itself supervised by the `runsvdir` program. At the very top is the Ubuntu Upstart init daemon (process 1). Here's an example process tree, with a single active guardian process ('SUS_SR2'):: init # Upstart init daemon (process 1) +-runsvdir -P /etc/guardian/service # runit main guardian runsvdir process +-runsv SUS_SR2 # runit individual daemon supervisor +-guardian SUS_SR2 # guardian main process | +-guardian SUS_SR2 (worker) # guardian worker subprocess | | +-{python} # guardian auxilliary threads | | +-{python} | | +-{python} | | +-{python} | +-{python} | +-{python} | +-{python} +-svlogd -tt /var/log/guardian/SUS_SR2 # svlogd logging daemon Interaction with this infrastucture is done via the :ref:`guardctrl` command line utility. configuration directories and files ----------------------------------- The main configuration directly on the guardian machine is ``/etc/guardian``:: controls@h1guardian0:~ 0$ ls -al /etc/guardian total 24 drwxr-xr-x 4 root root 4096 Feb 28 18:28 . drwxr-xr-x 104 root root 4096 Feb 22 15:57 .. -rw-r--r-- 1 cdsadmin cdsadmin 441 Feb 28 18:27 local-env lrwxrwxrwx 1 root root 18 Feb 20 15:34 logs -> /var/log/guardian/ drwxr-xr-x 41 controls controls 4096 Mar 28 16:18 nodes -rwxr-xr-x 1 cdsadmin cdsadmin 141 Feb 18 15:37 runsvdir-start drwxr-xr-x 2 controls controls 4096 Mar 28 16:18 service lrwxrwxrwx 1 root root 13 Feb 20 15:33 supervise -> /run/guardian Each node is given an unique configuration directory in ``/etc/guardian/nodes``. Each node directory describes how the guardian daemon should be started and logged:: controls@h1guardian0:~ 0$ find /etc/guardian/nodes/SUS_SR2 /etc/guardian/nodes/SUS_SR2/ /etc/guardian/nodes/SUS_SR2/env /etc/guardian/nodes/SUS_SR2/supervise /etc/guardian/nodes/SUS_SR2/log /etc/guardian/nodes/SUS_SR2/log/supervise /etc/guardian/nodes/SUS_SR2/log/config /etc/guardian/nodes/SUS_SR2/log/main /etc/guardian/nodes/SUS_SR2/log/run /etc/guardian/nodes/SUS_SR2/run /etc/guardian/nodes/SUS_SR2/guardian /etc/guardian/nodes/SUS_SR2/finish The existence of this directory itself does not initiate the service. To initiate the service the node directory has to be linked into the "service" directory ``/etc/guardian/service`` so that it can be instantiated by the main runsvdir supervisor. The supervise directory, ``/etc/guardian/supervise``, holds runit-specific supervision files for each node (fifos, sockets, etc.). It should be a link to a tmpfs, in this case mounted at ``/run/guardian``. The logs are stored in ``/var/log/guardian``. When first creating the infrastructure, it's important to create the needed extra directories (``/run/guardian``, ``/var/log/guardian``) with the correct permissions, and then link them into ``/etc/guardian``. The [[Guardian/guardctrl|guardctrl]] utility should then create the necessary subdirectories as needed. guardian run environment ------------------------ The `run` script in the node directory is what actually starts the guardian process. The guardian run scrips source the main environment file, ``/etc/guardian/local-env``, before execution. This file should be configured with any local environmental settings that are necessary for guardian to function properly:: export PATH=/bin:/usr/bin export LD_LIBRARY_PATH= export PYTHONPATH= . /ligo/apps/linux-x86_64/epics/etc/epics-user-env.sh . /ligo/apps/linux-x86_64/nds2-client/etc/nds2-client-user-env.sh || true . /ligo/apps/linux-x86_64/cdsutils/etc/cdsutils-user-env.sh . /ligo/apps/linux-x86_64/guardian/etc/guardian-user-env.sh ifo=`ls /ligo/cdscfg/ifo` site=`ls /ligo/cdscfg/site` export IFO=${ifo^^*} export NDSSERVER=${ifo}nds0:8088,${ifo}nds1:8088 upstart init ------------ The main service pool manager (runsvdir) is manged by upstart (Ubuntu's init daemon). The service is called `guardian-runsvdir`, and uses the following upstart 'init' file that specifies when it should be started and stopped, and any pre/post run conditions: ``/etc/init/guardian-runsvdir.conf``:: start on runlevel [2345] stop on shutdown respawn pre-start script mkdir -p /run/guardian mountpoint -q /run/guardian || mount -t tmpfs -o size=10M,user=controls tmpfs /run/guardian end script exec /etc/guardian/runsvdir-start kill signal HUP Note the service depends on there being a tempfs mounted at ``/run/guardian``. The init script actually executes the ``/etc/guardian/runsvdir-start`` script that sets up the environment for all guardian processes and then execs the runsvdir process: ``/etc/guardian/runsvdir-start``:: #!/bin/bash PATH=/bin:/usr/bin srvdir=/etc/guardian/service exec env - \ PATH=$PATH \ chpst -u controls /usr/bin/runsvdir -P $srvdir Upstart processes are controlled with the `initctl` utility. This can be used to start, stop, and check status of guardian-runsvdir. Any user can access status of an upstart-supervised daemon, but you must be root to start/stop the process:: controls@h1guardian0:~ 0$ initctl status guardian-runsvdir guardian-runsvdir start/running, process 23769 controls@h1guardian0:~ 0$