Difference between revisions of "Home directories max-out incident 28.11.2016"
Line 10: | Line 10: | ||
Despite seeing the opportunity for a secret frontend update, a major effort was made to not have to restart the frontend. Only when all other resorts had been explored, it was decided to restart. However, this procedure did not allow time to read the [Frontend Restart] wiki page right here. So key advice points in this page were consequently ignored. | Despite seeing the opportunity for a secret frontend update, a major effort was made to not have to restart the frontend. Only when all other resorts had been explored, it was decided to restart. However, this procedure did not allow time to read the [Frontend Restart] wiki page right here. So key advice points in this page were consequently ignored. | ||
− | The price of this was alot of debugging afterwards, because the NFS issues that arise are not uniform. In some cases (better said, in some nodes), there was no problem, while in others, it was hard to work out why Gridengine | + | The price of this was alot of debugging afterwards, because the NFS issues that arise are not uniform. In some cases (better said, in some nodes), there was no problem, while in others, it was hard to work out why Gridengine wasn't working. |
− | == the | + | == the old Gridengine wasn't cleared out properly == |
− | Namely the old start-up scripts were still present | + | Namely the old start-up scripts were still present in '''/etc/init.d'''. The correct ones were |
+ | sgeexecd.p6444 | ||
+ | sgeqmaster.p6444 |
Revision as of 16:35, 29 November 2016
Contents
Introduction
Since the 6TB expansion at the end of June 2016, there has been substantial harddisk space free, for several months it was 15TB, which seemed a little too much unused space.
However, there are bioinformatics workloads that can make short work of that sort of capacity. At the beginning of the year, a user ate up 9TB without knowing it, usually do to big alignments whihc trhow out some very big sam and bam files that really need to be supervised and
Errors
not reading this wiki
Despite seeing the opportunity for a secret frontend update, a major effort was made to not have to restart the frontend. Only when all other resorts had been explored, it was decided to restart. However, this procedure did not allow time to read the [Frontend Restart] wiki page right here. So key advice points in this page were consequently ignored.
The price of this was alot of debugging afterwards, because the NFS issues that arise are not uniform. In some cases (better said, in some nodes), there was no problem, while in others, it was hard to work out why Gridengine wasn't working.
the old Gridengine wasn't cleared out properly
Namely the old start-up scripts were still present in /etc/init.d. The correct ones were
sgeexecd.p6444 sgeqmaster.p6444