Difference between revisions of "Marvin and IPMI (remote hardware control)"

From wiki
Jump to: navigation, search
(Created page with "= Introduction = For example, when networking goes down on marvin and yet the machine is still up, ssh can no longer be used, so the administrator is stuck. However all mode...")
(No difference)

Revision as of 12:20, 2 February 2018

Introduction

For example, when networking goes down on marvin and yet the machine is still up, ssh can no longer be used, so the administrator is stuck.

However all modern, industrial grade (i.e. non-consumer) server now include a separate subsystem on the machine which is independent of the main machine. This is known by various names, and there is a standard called IPMI, which is what Supermicor calls it. Dell calls it DRAC and HP calls it LightsOn (or something to that effect).

It can be seen as a hardware remote control system and the devices include a network connection so one can login to the IPMI module and send hardware commands (principally power-on and power-cycle).

Despite IPMI's isolation from the main system, it is not immune to faults of its own, in which case, the only cure is to ring Ian at Services and go to the datacentre. Unfortunately, IPMI tends not to cooperate exactly when the man machines has problems of its own, which is disappointing because that's exactly when it's needed. Nevertheless, IPMI is better than nothing and has proved useful on many occasions.

Details

Marvin's nodes can all be remotely controlled by running