Disk management after shelf disk failure

From wiki
Jump to: navigation, search


The software for your RAID controller was LSI (now Broadcom) MegaCli, I've not been able to find any evidence that there's a Redhat package but I have found a download. The URL below should get you a zip file that contains a rpm and other installers for other operating systems.

https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.broadcom.com%2Fdocs%2F12351587&data=02%7C01%7Cjrc9%40st-andrews.ac.uk%7Cbe3c1405b577499c6ef808d813989466%7Cf85626cb0da849d3aa5864ef678ef01a%7C0%7C0%7C637280893338362007&sdata=p93NQ8JyhxGgqCHMiAZEHq8xSdODq1urak0LKSNo3%2F0%3D&reserved=0

If you can install the RPM that you should be able to display the current status of the array to confirm it's safe to remove the failed or failing disk and if necessary configure the new disks. The following commands should show the status of any RAID volumes and physical disks, if you can pipe the output into a couple of files and send it over I can advise on what needs to be done.

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aall
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -aall

The first command should produce something like this, we'll be able to tell if the array is degraded or not.

> Adapter 0 -- Virtual Drive Information: > Virtual Drive: 0 (Target Id: 0) > Name  : > RAID Level  : Primary-1, Secondary-3, RAID Level Qualifier-0 > Size  : 203.25 GB > Sector Size  : 512 > Mirror Data  : 203.25 GB > State  : Optimal > Strip Size  : 64 KB > Number Of Drives per span:2 > Span Depth  : 3 > Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache > if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No > Write Cache if Bad BBU Default Access Policy: Read/Write Current > Access Policy: Read/Write > Disk Cache Policy  : Disk's Default > Encryption Type  : None > Is VD Cached: No

The second command will produce a lot of output for each disk. The 'Firmware State' line should show how each disk is configured, most should be 'Online' or 'Hotspare' . Once the new disks are added we'll need to re-run this, they'll probably be listed as 'Unconfigured (Good)'. With some information about the disk position the command to configure the disks will be something like the following, I'll confirm when we know what the variable are.

/opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -PhysDrv\[$ENCLOSURE:$SLOT\] -a$ARRAY

Then if the array is degraded it should automatically start to rebuild using one or more of the new disks.

If it's still rebuilding this should show the progress:

/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv\[8:13\] -a0

I added one additional disk to each tray, they should be in Enclosure 8 Slot 14 and Enclosure 9 Slot 13. Confirm that with the output from PDList, if that's correct this should mark them as hotspares.

/opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -PhysDrv\[8:14\] -a0
/opt/MegaRAID/MegaCli/MegaCli64 -PDHSP - Set -PhysDrv\[9:13\] -a0

Confirm by running PDList again, their Firmware State should have updated from Unconfigured Good to Configured Hotspare or something similar.


to set as hot spares

/opt/MegaRAID/MegaCli/MegaCli64 -PDHSP  -Set -PhysDrv\[8:14\] -a0
/opt/MegaRAID/MegaCli/MegaCli64 -PDHSP  -Set -PhysDrv\[9:13\] -a0


From the verbose MegaCli -h … all the options in some kind of order > MegaCli -PDHSP {-Set [-Dedicated [-ArrayN|-Array0,1,2...]] [-EnclAffinity] [-nonRevertible]} > |-Rmv -PhysDrv[E0:S0,E1:S1,...] -aN|-a0,1,2|-aALL