Marvin and IPMI (remote hardware control)
For example, when networking goes down on marvin and yet the machine is still up, ssh can no longer be used, so the administrator is stuck.
However all modern, industrial grade (i.e. non-consumer) servers now include a separate subsystem on the machine which is independent of the main machine. This is known by various names, and there is a standard called IPMI, which is what Supermicro calls it. Dell calls it DRAC and HP calls it LightsOn (or something to that effect).
It can be seen as a hardware remote control system and is a device which include a network connection so one can login to the IPMI module and send hardware commands (principally power-on and power-cycle) to the main machine.
Despite IPMI's isolation from the main system, it is not immune to faults of its own, in which case, the only cure is to ring Ian at IT Services and physically go to the datacentre. Unfortunately, IPMI tends not to cooperate exactly when the man machines has problems of its own, which is disappointing because that's exactly when it's needed. Nevertheless, IPMI is better than nothing and has proved useful on many occasions.
Marvin's nodes can all be remotely controlled but only in marvin itself. So the usual exercise is to run firefox on marvin and connect to the node's IPMI IPs there.
When marvin's IPMI itself needs to be used, then this can be done from another computer within the University campus.
There is a standalone GUI application called IPMIconfig which does then same things as the IPMI web interface, but because it doesn't need a browser, can be faster.
The virtual console on IPMI's web interface uses JNLP (javaws) program and is the best implementation, but it can be patchy. It also allows the loading of a local Live Linux ISO file so that the machine may be booted from it, though this can be a bit tortuous. Certainly, it is very clear that Supermicro's IPMI interface is considerably inferior to Dell's DRAC interface used on the biotime machine. Nevertheless, it is possible to boot the recommended Linux Live ISO, sysrescuecd on Supermicro's IPMI.
Again it must be repeated that the virtual console's functioning is patchy. An even less dependable version of Virtual Console is SOL. It may seem silly to mention SOL when it is even worse than Virtual console, however it has one or two crucial advantages which make it the holy grail of remote hardware control:
- SOL is a raw terminal connection to the login screen of the main machine.
- it does not operate via buggy GUI's and web interfaces.
- one can connect via the command line and record all input and output via your local linux computer's "script" progam (see "man script").
- When it works, it is much faster than the alternatives.
- It behaves as if one really was sitting down locally at the machine, looking at the login screen.
IPMI Is Down
An issue was had in Jan 2019 where the IPMI for node1 and node3 wasn't accessible via the command line or webserver. This meant we couldn't reboot the nodes when they were up. Node3 was rebooted manually by Ally pushing a button on the box.
When trying to interact with the mc (management controller, sometimes baseboard management controller)
ipmitool mc info Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
The error tells me we need to start two services
modprobe ipmi_devintf modprobe ipmi_si
Which then gives us
ipmitool mc info Device ID : 32 Device Revision : 1 Firmware Revision : 2.06 IPMI Version : 2.0 Manufacturer ID : 47488 Manufacturer Name : Unknown (0xB980) Product ID : 43707 (0xaabb) Product Name : Unknown (0xAABB) Device Available : yes Provides Device SDRs : no Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver IPMB Event Generator Chassis Device Aux Firmware Rev Info : 0x01 0x00 0x00 0x00
So to warm reboot the ipmi we do:
ipmitool mc reset warm
This will either tell you it sent the warm reset command ("Sent warm reset command to MC") or return
MC reset command failed: Invalid command
If this happens, send the cold reset command
ipmitool mc reset cold Sent cold reset command to MC
This didn't work however, as
ipmitool -H node3IP -U ADMIN -P ***** mc info Error: Unable to establish LAN session
ipmitool -H node2IP -u ADMIN -P **** mc infoDevice ID : 32 Device Revision : 1 Firmware Revision : 2.59 IPMI Version : 2.0 Manufacturer ID : 47488 Manufacturer Name : Unknown (0xB980) Product ID : 43537 (0xaa11) Product Name : Unknown (0xAA11) Device Available : yes Provides Device SDRs : no Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver IPMB Event Generator Chassis Device
Resolution still not found 10.54 Jan 22nd 2019.
ipmitool -U ADMIN -P marvinIPMI chassis identify force
This turns on the flashing LED on the server indefinitely.
To turn it off again use
ipmitool -U ADMIN -P PASSWORD chassis identify 0
or to set a time for it to flash for
ipmitool -U ADMIN -P PASSWORD chassis identify 300 #five minutes