Difference between revisions of "Marvin and IPMI (remote hardware control)"
m (→Blinking LEDs) |
|||
(8 intermediate revisions by 2 users not shown) | |||
Line 3: | Line 3: | ||
For example, when networking goes down on marvin and yet the machine is still up, ssh can no longer be used, so the administrator is stuck. | For example, when networking goes down on marvin and yet the machine is still up, ssh can no longer be used, so the administrator is stuck. | ||
− | However all modern, industrial grade (i.e. non-consumer) | + | However all modern, industrial grade (i.e. non-consumer) servers now include a separate subsystem on the machine which is independent of the main machine. This is known by various names, and there is a standard called IPMI, which is what Supermicro calls it. Dell calls it DRAC and HP calls it LightsOn (or something to that effect). |
− | It can be seen as a hardware remote control system and | + | It can be seen as a hardware remote control system and is a device which include a network connection so one can login to the IPMI module and send hardware commands (principally power-on and power-cycle) to the main machine. |
− | Despite IPMI's isolation from the main system, it is not immune to faults of its own, in which case, the only cure is to ring Ian at Services and go to the datacentre. Unfortunately, IPMI tends not to cooperate exactly when the man machines has problems of its own, which is disappointing because that's exactly when it's needed. Nevertheless, IPMI is better than nothing and has proved useful on many occasions. | + | Despite IPMI's isolation from the main system, it is not immune to faults of its own, in which case, the only cure is to ring Ian at IT Services and physically go to the datacentre. Unfortunately, IPMI tends not to cooperate exactly when the man machines has problems of its own, which is disappointing because that's exactly when it's needed. Nevertheless, IPMI is better than nothing and has proved useful on many occasions. |
= Details = | = Details = | ||
− | Marvin's nodes can all be remotely controlled but only in marvin itself. So the usual exercise is to run firefox on marvin and connect to the IPMI IPs there. | + | Marvin's nodes can all be remotely controlled but only in marvin itself. So the usual exercise is to run firefox on marvin and connect to the node's IPMI IPs there. |
When marvin's IPMI itself needs to be used, then this can be done from another computer within the University campus. | When marvin's IPMI itself needs to be used, then this can be done from another computer within the University campus. | ||
Line 17: | Line 17: | ||
There is a standalone GUI application called IPMIconfig which does then same things as the IPMI web interface, but because it doesn't need a browser, can be faster. | There is a standalone GUI application called IPMIconfig which does then same things as the IPMI web interface, but because it doesn't need a browser, can be faster. | ||
− | The virtual console on IPMI's web interface uses JNLP (javaws) program and is the best implementation, but it can be patchy. It also allows the loading of a local Live Linux ISO file so that the machine may be booted from it, though this can be a bit tortuous. Certainly, it is very clear that Supermicro's IPMI interface is considerably inferior to Dell's DRAC interface used on the biotime machine. Nevertheless, it is possible to boot the recommended Linux Live ISO, [http://www.system-rescue-cd.org sysrescuecd]on Supermicro's IPMI. | + | The virtual console on IPMI's web interface uses JNLP (javaws) program and is the best implementation, but it can be patchy. It also allows the loading of a local Live Linux ISO file so that the machine may be booted from it, though this can be a bit tortuous. Certainly, it is very clear that Supermicro's IPMI interface is considerably inferior to Dell's DRAC interface used on the biotime machine. Nevertheless, it is possible to boot the recommended Linux Live ISO, [http://www.system-rescue-cd.org sysrescuecd] on Supermicro's IPMI. |
Again it must be repeated that the virtual console's functioning is patchy. An even less dependable version of Virtual Console is SOL. It may seem silly to mention SOL when it is even worse than Virtual console, however it has one or two crucial advantages which make it the holy grail of remote hardware control: | Again it must be repeated that the virtual console's functioning is patchy. An even less dependable version of Virtual Console is SOL. It may seem silly to mention SOL when it is even worse than Virtual console, however it has one or two crucial advantages which make it the holy grail of remote hardware control: | ||
Line 24: | Line 24: | ||
* one can connect via the command line and record all input and output via your local linux computer's "script" progam (see "man script"). | * one can connect via the command line and record all input and output via your local linux computer's "script" progam (see "man script"). | ||
* When it works, it is much faster than the alternatives. | * When it works, it is much faster than the alternatives. | ||
− | * It behaves as if one really was | + | * It behaves as if one really was sitting down locally at the machine, looking at the login screen. |
+ | |||
+ | |||
+ | =IPMI Is Down= | ||
+ | |||
+ | An issue was had in Jan 2019 where the IPMI for node1 and node3 wasn't accessible via the command line or webserver. This meant we couldn't reboot the nodes when they were up. Node3 was rebooted manually by Ally pushing a button on the box. | ||
+ | |||
+ | When trying to interact with the mc (management controller, sometimes baseboard management controller) | ||
+ | |||
+ | ipmitool mc info | ||
+ | Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory | ||
+ | |||
+ | The error tells me we need to start two services | ||
+ | |||
+ | modprobe ipmi_devintf | ||
+ | modprobe ipmi_si | ||
+ | |||
+ | Which then gives us | ||
+ | ipmitool mc info | ||
+ | Device ID : 32 | ||
+ | Device Revision : 1 | ||
+ | Firmware Revision : 2.06 | ||
+ | IPMI Version : 2.0 | ||
+ | Manufacturer ID : 47488 | ||
+ | Manufacturer Name : Unknown (0xB980) | ||
+ | Product ID : 43707 (0xaabb) | ||
+ | Product Name : Unknown (0xAABB) | ||
+ | Device Available : yes | ||
+ | Provides Device SDRs : no | ||
+ | Additional Device Support : | ||
+ | Sensor Device | ||
+ | SDR Repository Device | ||
+ | SEL Device | ||
+ | FRU Inventory Device | ||
+ | IPMB Event Receiver | ||
+ | IPMB Event Generator | ||
+ | Chassis Device | ||
+ | Aux Firmware Rev Info : | ||
+ | 0x01 | ||
+ | 0x00 | ||
+ | 0x00 | ||
+ | 0x00 | ||
+ | |||
+ | |||
+ | So to warm reboot the ipmi we do: | ||
+ | |||
+ | ipmitool mc reset warm | ||
+ | |||
+ | |||
+ | This will either tell you it sent the warm reset command ("Sent warm reset command to MC") or return | ||
+ | |||
+ | MC reset command failed: Invalid command | ||
+ | |||
+ | If this happens, send the cold reset command | ||
+ | |||
+ | ipmitool mc reset cold | ||
+ | Sent cold reset command to MC | ||
+ | |||
+ | |||
+ | This didn't work however, as | ||
+ | ipmitool -H node3IP -U ADMIN -P ***** mc info | ||
+ | Error: Unable to establish LAN session | ||
+ | |||
+ | but | ||
+ | |||
+ | ipmitool -H node2IP -u ADMIN -P **** mc infoDevice ID : 32 | ||
+ | Device Revision : 1 | ||
+ | Firmware Revision : 2.59 | ||
+ | IPMI Version : 2.0 | ||
+ | Manufacturer ID : 47488 | ||
+ | Manufacturer Name : Unknown (0xB980) | ||
+ | Product ID : 43537 (0xaa11) | ||
+ | Product Name : Unknown (0xAA11) | ||
+ | Device Available : yes | ||
+ | Provides Device SDRs : no | ||
+ | Additional Device Support : | ||
+ | Sensor Device | ||
+ | SDR Repository Device | ||
+ | SEL Device | ||
+ | FRU Inventory Device | ||
+ | IPMB Event Receiver | ||
+ | IPMB Event Generator | ||
+ | Chassis Device | ||
+ | |||
+ | |||
+ | Resolution still not found 10.54 Jan 22nd 2019. | ||
+ | |||
+ | |||
+ | ===Blinking LEDs=== | ||
+ | |||
+ | ipmitool -U ADMIN -P marvinIPMI chassis identify force | ||
+ | |||
+ | This turns on the flashing LED on the server indefinitely. | ||
+ | |||
+ | To turn it off again use | ||
+ | |||
+ | ipmitool -U ADMIN -P PASSWORD chassis identify 0 | ||
+ | |||
+ | or to set a time for it to flash for | ||
+ | |||
+ | ipmitool -U ADMIN -P PASSWORD chassis identify 300 #five minutes |
Latest revision as of 08:27, 25 January 2019
Introduction
For example, when networking goes down on marvin and yet the machine is still up, ssh can no longer be used, so the administrator is stuck.
However all modern, industrial grade (i.e. non-consumer) servers now include a separate subsystem on the machine which is independent of the main machine. This is known by various names, and there is a standard called IPMI, which is what Supermicro calls it. Dell calls it DRAC and HP calls it LightsOn (or something to that effect).
It can be seen as a hardware remote control system and is a device which include a network connection so one can login to the IPMI module and send hardware commands (principally power-on and power-cycle) to the main machine.
Despite IPMI's isolation from the main system, it is not immune to faults of its own, in which case, the only cure is to ring Ian at IT Services and physically go to the datacentre. Unfortunately, IPMI tends not to cooperate exactly when the man machines has problems of its own, which is disappointing because that's exactly when it's needed. Nevertheless, IPMI is better than nothing and has proved useful on many occasions.
Details
Marvin's nodes can all be remotely controlled but only in marvin itself. So the usual exercise is to run firefox on marvin and connect to the node's IPMI IPs there.
When marvin's IPMI itself needs to be used, then this can be done from another computer within the University campus.
There is a standalone GUI application called IPMIconfig which does then same things as the IPMI web interface, but because it doesn't need a browser, can be faster.
The virtual console on IPMI's web interface uses JNLP (javaws) program and is the best implementation, but it can be patchy. It also allows the loading of a local Live Linux ISO file so that the machine may be booted from it, though this can be a bit tortuous. Certainly, it is very clear that Supermicro's IPMI interface is considerably inferior to Dell's DRAC interface used on the biotime machine. Nevertheless, it is possible to boot the recommended Linux Live ISO, sysrescuecd on Supermicro's IPMI.
Again it must be repeated that the virtual console's functioning is patchy. An even less dependable version of Virtual Console is SOL. It may seem silly to mention SOL when it is even worse than Virtual console, however it has one or two crucial advantages which make it the holy grail of remote hardware control:
- SOL is a raw terminal connection to the login screen of the main machine.
- it does not operate via buggy GUI's and web interfaces.
- one can connect via the command line and record all input and output via your local linux computer's "script" progam (see "man script").
- When it works, it is much faster than the alternatives.
- It behaves as if one really was sitting down locally at the machine, looking at the login screen.
IPMI Is Down
An issue was had in Jan 2019 where the IPMI for node1 and node3 wasn't accessible via the command line or webserver. This meant we couldn't reboot the nodes when they were up. Node3 was rebooted manually by Ally pushing a button on the box.
When trying to interact with the mc (management controller, sometimes baseboard management controller)
ipmitool mc info Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
The error tells me we need to start two services
modprobe ipmi_devintf modprobe ipmi_si
Which then gives us
ipmitool mc info Device ID : 32 Device Revision : 1 Firmware Revision : 2.06 IPMI Version : 2.0 Manufacturer ID : 47488 Manufacturer Name : Unknown (0xB980) Product ID : 43707 (0xaabb) Product Name : Unknown (0xAABB) Device Available : yes Provides Device SDRs : no Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver IPMB Event Generator Chassis Device Aux Firmware Rev Info : 0x01 0x00 0x00 0x00
So to warm reboot the ipmi we do:
ipmitool mc reset warm
This will either tell you it sent the warm reset command ("Sent warm reset command to MC") or return
MC reset command failed: Invalid command
If this happens, send the cold reset command
ipmitool mc reset cold Sent cold reset command to MC
This didn't work however, as
ipmitool -H node3IP -U ADMIN -P ***** mc info Error: Unable to establish LAN session
but
ipmitool -H node2IP -u ADMIN -P **** mc infoDevice ID : 32 Device Revision : 1 Firmware Revision : 2.59 IPMI Version : 2.0 Manufacturer ID : 47488 Manufacturer Name : Unknown (0xB980) Product ID : 43537 (0xaa11) Product Name : Unknown (0xAA11) Device Available : yes Provides Device SDRs : no Additional Device Support : Sensor Device SDR Repository Device SEL Device FRU Inventory Device IPMB Event Receiver IPMB Event Generator Chassis Device
Resolution still not found 10.54 Jan 22nd 2019.
Blinking LEDs
ipmitool -U ADMIN -P marvinIPMI chassis identify force
This turns on the flashing LED on the server indefinitely.
To turn it off again use
ipmitool -U ADMIN -P PASSWORD chassis identify 0
or to set a time for it to flash for
ipmitool -U ADMIN -P PASSWORD chassis identify 300 #five minutes