Difference between revisions of "Node9 network failure incident 16-20.03.2017"
(Created page with "= Introduction = Node9 usually a perfectly behaving machine suddenly loses network connections. = Troubleshooting = Going through all the procedures to see which component...") |
(No difference)
|
Latest revision as of 18:05, 20 March 2017
Introduction
Node9 usually a perfectly behaving machine suddenly loses network connections.
Troubleshooting
Going through all the procedures to see which component was to blame took an entire day, unfortunately during quite a busy period for projects.
It turns out the machine nor its network interfaces were not to blame. This was time wasted because it turned out to be a Network Services issue. They were chaning around network hubs and some cabling did not get re-seated properly, it would seem.
Here are the details
The Butts Wynd Data Centre Core Switches' (BWDC Cisco Nexus 5672s) port-channel linking your High Performance Computing System (Viglen HX425Ca - Asset 03267, rack 17 position 35) is now up and running (see output included below), having re-seated the copper cabling connecting up both of its Ethernet interfaces. Please do let us know if this problem re-occurs. bwdc-n5672-0# show port-channel sum | inc Po146 146 Po146(SU) Eth LACP Eth108/1/28(P) bwdc-n5672-0# show logging | inc Ethernet108/1/28 ... 2017 Mar 20 16:15:03 bwdc-n5672-0 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel146: first operational port changed from none to Ethernet108/1/28 2017 Mar 20 16:15:03 bwdc-n5672-0 %ETHPORT-5-IF_UP: Interface Ethernet108/1/28 is up in mode access bwdc-n5672-1# show port-channel sum | inc Po146 146 Po146(SU) Eth LACP Eth109/1/28(P)bwdc-n5672-1# ... 2017 Mar 20 16:15:00 bwdc-n5672-1 %ETH_PORT_CHANNEL-5-FOP_CHANGED: port-channel146: first operational port changed from none to Ethernet109/1/28 2017 Mar 20 16:15:00 bwdc-n5672-1 %ETHPORT-5-IF_UP: Interface Ethernet109/1/28 is up in mode access