Incident: Can't connect to BerkeleyDB
Introduction
On 23 August 2016, marvin main system partition ran out of space. This is normally catastrophic to all running services. However the system did not fall, just one service started working anomalously: the queue manager, gridengine (version GE2011.11p1).
The effect was that the normal gridengine commands such as qsub, qstat, qconf would fail. The error report was that it couldn't connect to the Berkeley database. Hence the name of this entry.
First investigations
The root cause was easy to find, quite clearly there was no space on the hard disk. This was quickly liberated, but the problems continued. Perhaps gridengine need to be restarted?. It consistes of two services
* sgeexecd.marvin * sgemaster.marvin