Difference between revisions of "Incident: Can't connect to BerkeleyDB"
(Created page with "= Introduction = On 23 August 2016, marvin main system partition ran out of space. This is normally catastrophic to all running services. However the system did not fall, jus...") |
|||
Line 7: | Line 7: | ||
= First investigations = | = First investigations = | ||
− | The root cause was easy to find, quite clearly there was no space on the hard disk. This was quickly liberated, but the problems continued. Perhaps gridengine need to be restarted? | + | The root cause was easy to find, quite clearly there was no space on the hard disk. This was quickly liberated, but the problems continued. Perhaps gridengine need to be restarted? It consists of two services |
* sgeexecd.marvin | * sgeexecd.marvin | ||
* sgemaster.marvin | * sgemaster.marvin |
Revision as of 12:29, 25 August 2016
Introduction
On 23 August 2016, marvin main system partition ran out of space. This is normally catastrophic to all running services. However the system did not fall, just one service started working anomalously: the queue manager, gridengine (version GE2011.11p1).
The effect was that the normal gridengine commands such as qsub, qstat, qconf would fail. The error report was that it couldn't connect to the Berkeley database. Hence the name of this entry.
First investigations
The root cause was easy to find, quite clearly there was no space on the hard disk. This was quickly liberated, but the problems continued. Perhaps gridengine need to be restarted? It consists of two services
* sgeexecd.marvin * sgemaster.marvin