Background & Motivation
kubecf the processes of the jobs in an instance group were
This allowed a human operator to suspend (kill) and later restart these processes as a means of preventing them from interfering with low-level operations like restoring a cluster using raw database backups, and the like. Such suspensions were also not visible at kube level as the pod and container kept running, except through live- and readiness-probes.
The process control features added to the
application of the operator serve the same purpose.
The process control features of
containerrun are accessible through
an unix domain datagram socket at location
_containerrun.sock in the
container. Due to this placement the feature is not accessible from
outside a cluster. An operator (or script written by such) has to log
into the relevant container(s) to use the feature.
Suspending the monitored child processes is done by sending the command
STOPto this socket.
Conversely, restarting the child processes is done by sending the command
STARTto this socket.
STARTcommand when the child processes are running has no effect. Conversely the same is true for sending a
STOPcommand when the child processes are suspended already.
Any other command sent to the socket is ignored.
Any tool able to send datagram packet to a unix domain socket of that type should work.
echo START | nc -w 1 --unixsock --udp /var/vcap/data/JOB/PROCESS_containerrun.sock
echo STOP | nc -w 1 --unixsock --udp /var/vcap/data/JOB/PROCESS_containerrun.sock
Note that all of these sockets are placed in the volumne shared by all container of all jobs of the instance group. It is enough to ssh into one of the containers to be able to send commands to all sockets and thus jobs.
for sock in $(find /var/vcap/dataame '*_containerrun.sock') do echo STOP | nc -w 1 --unixsock --udp $sock done