First, you don’t know what device you are on in the Cisco exam. It’s just a simulator and testing your knowledge of the command line, not the specific hardware. I recently had a module failure in a Cisco 4500 chassis, and part of the troubleshooting reminded me of doing a Cisco exam.
Let me explain.
The year was 2020, the month was January and work had just started back for the year. I had logged in, checking emails and the incident queue when I get a message appear on my google chat window. In the real world, people don’t log tickets they just remember you helped them long ago and reach out directly. It doesn’t matter how many times you say Service Desk they just come straight to you. I won’t say no or ‘ log a ticket’ because someone is in need and this person was in the IT department, so if you do him or her a favour then you will get one back. That is the way it should work.
The messages said wireless connectivity was impacted and some desk ports were not working. We also got some messages from our monitoring that some ports had gone offline.
I still had connectivity to the site and after logging in I found the following in the logs –
Feb 3 09:31:06 AEST: %C4K_HWPORTMAN-3-SUPERPORTMACLINKDOWN: Superport Mac link down on Superport 20 on slot 7.
Feb 3 09:31:06 AEST: %C4K_HWPORTMAN-3-SUPERPORTMACLINKDOWN: Superport Mac link down on Superport 21 on slot 7.
Feb 3 09:31:06 AEST: %C4K_HWPORTMAN-3-SUPERPORTMACLINKDOWN: Superport Mac link down on Superport 32 on slot 7.
Feb 3 09:31:06 AEST: %C4K_HWPORTMAN-3-SUPERPORTMACLINKDOWN: Superport Mac link down on Superport 33 on slot 7.
I took a ‘show tech’ and then I power cycled the module, which you can do on a 4500 chassis.
The module came back up and connectivity was restored, for about 12 hours. I was on call that night and got a message about midnight that ports had failed again. I reset the module and then went back to sleep.
By this time, I had logged a case to Cisco to check for bugs. That morning it happened again, although this time when I reset the module it never came back.
So far, this is all real world. TAC case logged, new module now shipped and then we do a swap out (under change control of course).
The new module did not work.
Not the best message to see –
Feb 5 14:55:03 AEST: %C4K_CHASSIS-3-LINECARDSEEPROMREADFAILED: Failed to read module 7’s serial eeprom, try reinserting module
It is now day two and site has a workaround for people connected to this module. The bugs and cases online point to a possible chassis fault. A new module and new chassis were sent to site.
Replacing the chassis is not an easy task, but we had remote hands to do this for us as this site was about 1000 kms away from me.
To ensure nothing was wrong with the second module, we first swapped a working module, so module 1 to module 7. Still the same message that it failed to read module 7.
It was chassis replacement time.
It took them about an hour to replace the chassis and plug all cables in (the cabling was very neat) and this is where it started to feel more like a Cisco exam.
The site had the 4500 connected to two 3750 switches, with EIGRP neighbors for connectivity. Two / 30 Layer 3 interconnects were provisioned between the devices for load balancing and I had connectivity to the 3750 via BGP.
I was on the 3750 switch, and it had been enough time for the chassis to power up and EIGRP neighbors to form, but I saw nothing.
So, check Layer 1 and it’s all good, ports are up. Check layer 2, CDP neighbors and I can see the device, its hostname and IP addressing. Configuration was loaded but I had no EIGRP?
I had connectivity from local site to the 4500, but I could not access the 4500 from the WAN. The default route was not in the 4500, as it was coming from BGP, redistributed to EIGRP and no floating static was used on the local site.
Time to ssh directly to the layer 3 interconnect IP and see what is going on. I was able to login with local credentials and found that EIGRP config was not even present in the configuration. Before I continue troubleshooting, I needed to get the site operational, so I deployed a floating static (static route with high AD) and it had to be higher than external EIGRP AD of 170.
On the 3750 I deployed a static route to the site summary address; this was then redistributed into BGP so the WAN could get there. So, for now connectivity restored and all modules were online.
So, why didn’t EIGRP show up in the configuration?
I found that due to the chassis replacement, the serial change causes the licence to fall back to the default licence. It had no layer 3 routing protocols supported so it didn’t load the EIGRP config. The licence file is linked to chassis serial and I needed a new one!
Once I had the new licence, I applied it rebooted during an outage window. I applied the EIGRP config and the external EIGRP default route was placed into the routing table.
So, in summary the real world is very physical, you must be aware of the hardware and how it operates. The exam world will teach you the commands and theory, but it is real world experience that really makes you a network engineer and can set you apart from others.
In this example, not only did I apply some commands I had to deal with onsite engineers, troubleshoot hardware, perform diagnostic testing, liaise with third parties for equipment, organise change windows, log changes, seek approvals, copy current config and status, check and test after replacing equipment, apply licences and document.
There is not enough time in an exam to do all that 🙂
~Brad.