Frontend Restart

From wiki
Revision as of 16:12, 2 August 2016 by Rf (talk | contribs)
Jump to: navigation, search

Introduction

Contains notes on how to restart marvin.

Measures

Bring all nodes down before restart

This is possibly the most useful measure. Primarily, it is due to the nodes using marvin to keep various filesystems mounted, and the havoc they experience when marvin stops doing this. NFS4 stale filehandles then appear and are hard to get rid of. This measure is not immediately obvious, because all the nodes are updated on a rolling basis and often do not need to be switched off.

And then, when marvin is back up, and once its filesystems are verified, the nodes maybe brought back up. Of course this seems like quite alot of extra work, but it's worth it in terms of saving later debugging time.

Try to get console access to the frontend

This can be solved with IPMI, although there are various options:

  • via the ipmiconfig tool, this is command line only.
  • via the IPMIView tool, GUI.
  • via the IPMI device's webserver
  • via the SOL (part of ipmiconfig)

SOL is closest to being at the terminal, with the added advantage of being able to use linux screen's history capability to record a session. Unfortunately, it seldom works. The webserver and the IPMIView tool have an alternative console program using java, termed "KVM". This uses the Iced Tea jnlp environment, but it can be fussy about keys, so may not work.


Provisos

Restarting marvin is a major operation, as all running jobs are lost.

It is therefore necessary to advise all users well in advance, as to when it might happen.