Excuse the latency of my reply.

I have been very slack with my blogs this year; I have been busy trying to learn SD WAN as fast as possible and even going in depth with Wireshark and TCP. I am on both sides of the network spectrum, working with cutting edge and then rediscovering a protocol that has existed since the decade of my birth and still in use today.

I have had many challenging operational issues lately, and still searching for root causes in some of them. I hope to share some great stories once I get to the bottom of them.

To try and kick me into gear in the new year I have also decided to apply for the Cisco Champion program in 2021. I was last in this program in 2019 and won the Best Newcomer for my blog. I am hoping to learn and share what I can when if I am selected.

If I am not, then I just need to set some time each week to blog and stick to it.

In a non-technical path, my current manager in my region has got me onto a good book which I am also being a bit slack on reading, it is called the 7 Habits of Highly Effective people. I am about three quarters of the way through it but like most informational and self-help books it needs a few reads to become embedded in the mind. The last self-help I obsessed over and still think about, especially this year is Who Moved My Cheese. It is very simple but very effective in dealing with unforeseen changes and preparing for change.

I will try and post before the end of the year.

Until then, peace.

~Brad.

Dreams & Nightmares

It’s been a few months since I have done a proper blog entry. Many things have changed in my career, my life and in all our lives. The nightmare we all live at the moment is one of a pandemic.

Many, many years ago, my dream was to live and work in New York City as a Network Engineer. This dream, still partially alive but is now also partially fulfilled. 

As you can see and so can the world, we can’t travel anywhere due to Covid 19. 

But, virtually we can travel anywhere. I can connect to devices across hundreds of countries, scientists can control rovers on the surface of Mars and we can even stay in touch with Voyager 1 & 2 beyond our Solar System. My boss is 15000km from me, but I talk to him across the internet like he is two suburbs away. The world is more connected than ever, how lucky we are to live in a time of such great connectivity and be so segmented physically. 

I was lucky enough to be offered a role in a Global Team, my managers are now in the US. So, technically although I am not in the US my managers are and my team is all across the world. This is as good as it gets for now, but this may also be the future. 

I now have been given a chance to do what I have wanted to do, specialise in a technology.There are thousands of Network Engineers across the world being the glue to all other departments. The fabric to intertwine and connect people’s knowledge together to solve a problem. There will always be a need for a network person to join an Incident Management call, help with explaining a traffic flow or finding the location of an IP address. Every now and then we become overwhelmed with the amount we need to know, or we may have a spark that wants us to specialise in a field. 

My field of interest in this new role is SD-WAN. 

For the uninitiated, it is a new WAN technology which allows greater control and also is more an application based network, than a traditional network. Instead of configuring, managing and troubleshooting each router in the network, we can now separate the planes of the router and stretch them across a fabric. 

The control plane is now segmented and centralized. The Data Plane stays on the router and passes traffic based on control plane policies and the management plane is a GUI and suite of protocols used to bring this all together. 

Another way to kind of think about this is a Cisco 6500 switch. It’s a modular chassis and contains hardware modules that slide into a chassis. The backplane of the chassis can be seen as the management plane (allow modules to speak to each other and share information), the supervisors can be seen as the control plane (routing protocols) and the modules that do hardware forwarding (switching of traffic) can be seen as the data plane. 

Obviously you can’t run a 6500 with no supervisors at all, but was hoping this might give you an idea of how it has been separated.  

SDWAN routers can operate without any connection to the control plane, they will retain the last known configuration and routing information. 

For now, I am still using the CLi  for SD-WAN at this stage, although the possibilities of APIs, automation and using the centralized vManage is very exciting. 

The goal of CCIE is still front of mind, but by the time I actually sit it I am not even sure what the topics will be on it. Luckily for me, I am at the forefront of new technologies right now and working on them daily. This is also a nightmare scenario, so much to learn and not enough time in the day at work and at home with kids. 

So, I have my latest Cisco Press SD-WAN book and a new motivation and purpose.

For now, I just need to catch up. 

~Brad. 

vEdge Tools

I am new to the vEdge and spending most my working life on the command line, I find it hard not to use it when troubleshooting. SDWAN and other technologies seem to be pushing for the self healing network, the network that can fix everything itself with little or no network knowledge for administrators.

Although this is great for routing protocols to re-converge or traffic engineering that automatically re-routes traffic, sometimes you still need to send a telnet packet on port 5568 for that Dev guy in Finance.

On the vEdge, you do have access to some installed tools. For one troubleshooting event I had recently I had to test if TACACS packets were getting to the TACACS server. I can’t use the telnet command, but on the vEdge if you type the following some nice little tools are presented.

# tools ?

Possible completions:

ike-debug

ip-route

iperf

minicom

netstat

nping

ss

stun-client

vtysh

So, for todays blog we will look into nping. These tools have been around for a long time in the linux world, so they are not new but usually you have to install them on laptops or machines to use. Here on the vEdge you have direct access to the tools.

To craft a TACACS packet, using nping, I did the following –

(Don’t forget the quotation marks!)

tools nping vpn <xxx> options “–tcp -p 49” 10.x.x.x

xxx = VPN number of source interface being used

–tcp = use TCP

-p = use Port 49

With this command I was able to send a ping to the application port and see TCP Sent and RCV packets. It contains TTL, IPLength, seq numbers and window size, including the mss.

Check out the list of all options for nping here –

https://www.cisco.com/c/en/us/td/docs/routers/sdwan/command/sdwan-cr-book/operational-cmd.html#wp5736741660

~Brad

The NeverEnding Default Route….Part 2.

So, after the IOS upgrade the route is now stable! I have taken a new packet capture and the default route is still being sent to us, but now our router won’t recalculate and enter it into table.

I have informed Cisco and waiting to see if they have an explanation. My theory now is, the ISP sending the default is some type of soft reconfiguration or route refresh issue. On our side it must have been the same, where it was constantly rechecking and entering  due to a bug.

Until Cisco get back to me, I can’t do anymore testing as I don’t have such a lab.

~Brad.

The NeverEnding Default Route….

The NeverEnding Story….whoa…whoa…whoa!

Loved that movie as a kid.

Moving on and speaking about moving on I have been made redundant at my current workplace. I have one week remaining and ever since we installed a new Internet link, I been looking at a very peculiar default route issue.

The ISP, as proven by Wireshark is sending me a update packet for the default route every two seconds. This causes our router to re-install the route into our routing table, repeatedly. The initial troubleshooting began with changing the advertisement interval of BGP to 60 seconds. So now, the default route update packet comes in every 60 seconds.

Pretty obvious it’s a carrier issue.

I spoke with colleagues, Cisco TAC, posted on message boards and everyone is of the opinion that nothing on our router could ask our BGP neighbor to send us the default every 60 seconds.

Last night, the carrier showed me a lab they had built with the exact hardware in use. It doesn’t occur in the lab. I found out the IOS they are using for our device is newer, and the config is a little light. They are only advertising a default, where the real link is advertising a full BGP table and we are filtering.

Still, got me very puzzled. I did ask for a packet capture on their side, to see if they send an update packet to other customers but they decided the lab was a better avenue to explore.

So, tonight a colleague of mine is updating the memory and upgrading the IOS.

Will the IOS fix the issue?

Will the IOS upgrade stop the BGP peer from sending default route updates?

Will Batman and Robin escape from the Joker trap?

Stay tuned.

~Brad.

Cisco Community Post –

https://community.cisco.com/t5/routing/bgp-no-keepalives-and-two-second-updates/td-p/4048584

 

Powershell, the new Telnet!

As a network engineer, you usually need to prove that ports are open across the network. One troubleshooting step when working with TCP is using Telnet and accessing the remote host with the destination port. This is a quick way to ensure end to end connectivity is allowed, and you can safely update the ticket and send it back to the user for further application layer testing.

The issue is that a lot of servers and hosts don’t run telnet anymore, as it is a insecure protocol, clear text and not best practice. Usually the application owner will then request Telnet be installed so they can do testing and then probably leave it installed. Eventually a security scan will detect it and it will be removed.

I have found, that you don’t need to install Telnet on your servers. If you have access to the Powershell terminal, you can do the exact same test and it also gives you a better response. The Telnet test usually gives you a blank command line window, which isn’t very user friendly. The powershell test gives you better feedback and can be customized to ensure you can test your TCP protocols.

Once you have access to the powershell terminal, you can use the following command –

Test-NetConnection x.x.x.x -Port y

x.x.x.x = IP address or domain name

y = Port Number

The response is as follows –

PS C:\Users\ciscoworkerbee> Test-NetConnection google.com -Port 443

ComputerName : google.com
RemoteAddress : 216.58.196.142
RemotePort : 443
InterfaceAlias : Ethernet 2
SourceAddress : 10.10.10.10
PingSucceeded : True
PingReplyDetails (RTT) : 1 ms
TcpTestSucceeded : True

Here is the Microsoft page for more information.

https://docs.microsoft.com/en-us/powershell/module/nettcpip/test-netconnection?view=win10-ps

Happy troubleshooting.

~Brad.

 

 

Real World vs Exams – 4500 Chassis

First, you don’t know what device you are on in the Cisco exam. It’s just a simulator and testing your knowledge of the command line, not the specific hardware.  I recently had a module failure in a Cisco 4500 chassis, and part of the troubleshooting reminded me of doing a Cisco exam.

Let me explain.

The year was 2020, the month was January and work had just started back for the year. I had logged in, checking emails and the incident queue when I get a message appear on my google chat window. In the real world, people don’t log tickets they just remember you helped them long ago and reach out directly. It doesn’t matter how many times you say Service Desk they just come straight to you. I won’t say no or ‘ log a ticket’ because someone is in need and this person was in the IT department, so if you do him or her a favour then you will get one back. That is the way it should work.

The messages said wireless connectivity was impacted and some desk ports were not working. We also got some messages from our monitoring that some ports had gone offline.

I still had connectivity to the site and after logging in I found the following in the logs –

Feb 3 09:31:06 AEST: %C4K_HWPORTMAN-3-SUPERPORTMACLINKDOWN: Superport Mac link down on Superport 20 on slot 7.

Feb 3 09:31:06 AEST: %C4K_HWPORTMAN-3-SUPERPORTMACLINKDOWN: Superport Mac link down on Superport 21 on slot 7.

Feb 3 09:31:06 AEST: %C4K_HWPORTMAN-3-SUPERPORTMACLINKDOWN: Superport Mac link down on Superport 32 on slot 7.

Feb 3 09:31:06 AEST: %C4K_HWPORTMAN-3-SUPERPORTMACLINKDOWN: Superport Mac link down on Superport 33 on slot 7.

I took a ‘show tech’ and then I power cycled the module, which you can do on a 4500 chassis.

The module came back up and connectivity was restored, for about 12 hours. I was on call that night and got a message about midnight that ports had failed again. I reset the module and then went back to sleep.

By this time, I had logged a case to Cisco to check for bugs. That morning it happened again, although this time when I reset the module it never came back.

So far, this is all real world. TAC case logged, new module now shipped and then we do a swap out (under change control of course).

The new module did not work.

Not the best message to see –

Feb 5 14:55:03 AEST: %C4K_CHASSIS-3-LINECARDSEEPROMREADFAILED: Failed to read module 7’s serial eeprom, try reinserting module

It is now day two and site has a workaround for people connected to this module. The bugs and cases online point to a possible chassis fault. A new module and new chassis were sent to site.

Replacing the chassis is not an easy task, but we had remote hands to do this for us as this site was about 1000 kms away from me.

To ensure nothing was wrong with the second module, we first swapped a working module, so module 1 to module 7. Still the same message that it failed to read module 7.

It was chassis replacement time.

It took them about an hour to replace the chassis and plug all cables in (the cabling was very neat) and this is where it started to feel more like a Cisco exam.

The site had the 4500 connected to two 3750 switches, with EIGRP neighbors for connectivity. Two / 30 Layer 3 interconnects were provisioned between the devices for load balancing and I had connectivity to the 3750 via BGP.

I was on the 3750 switch, and it had been enough time for the chassis to power up and EIGRP neighbors to form, but I saw nothing.

So, check Layer 1 and it’s all good, ports are up. Check layer 2, CDP neighbors and I can see the device, its hostname and IP addressing. Configuration was loaded but I had no EIGRP?

I had connectivity from local site to the 4500, but I could not access the 4500 from the WAN. The default route was not in the 4500, as it was coming from BGP, redistributed to EIGRP and no floating static was used on the local site.

Time to ssh directly to the layer 3 interconnect IP and see what is going on. I was able to login with local credentials and found that EIGRP config was not even present in the configuration. Before I continue troubleshooting, I needed to get the site operational, so I deployed a floating static (static route with high AD) and it had to be higher than external EIGRP AD of 170.

On the 3750 I deployed a static route to the site summary address; this was then redistributed into BGP so the WAN could get there. So, for now connectivity restored and all modules were online.

So, why didn’t EIGRP show up in the configuration?

I found that due to the chassis replacement, the serial change causes the licence to fall back to the default licence. It had no layer 3 routing protocols supported so it didn’t load the EIGRP config. The licence file is linked to chassis serial and I needed a new one!

Once I had the new licence, I applied it rebooted during an outage window. I applied the EIGRP config and the external EIGRP default route was placed into the routing table.

So, in summary the real world is very physical, you must be aware of the hardware and how it operates.  The exam world will teach you the commands and theory, but it is real world experience that really makes you a network engineer and can set you apart from others.

In this example, not only did I apply some commands I had to deal with onsite engineers, troubleshoot hardware, perform diagnostic testing, liaise with third parties for equipment, organise change windows, log changes, seek approvals, copy current config and status, check and test after replacing equipment, apply licences and document.

There is not enough time in an exam to do all that 🙂

~Brad.

New Study Begins – Interested?

The first new study of the year will be Python. I have found this course below via Reddit which is free and directed at network engineers, which is handy for me.

This should really start the ball rolling for the CCIE Lab exam, as you are expected to automate during the lab.

If you are looking for something free and in IT why not give it a go?

~Brad.

Cisco Certification Updates

So, if you are on my LinkedIn you may have seen a lot of new certifications verified to myself via the Acclaim website.

Cisco have now retired two certifications and modified my existing ones, so no more CCDA or CCDP.

I have now been given the following –

CCNA
CCNP Enterprise (CCNP-Enterprise)
Cisco Certified Specialist – Enterprise Advanced Infrastructure Implementation (CCS-EAII)
Cisco Certified Specialist – Enterprise Core (CCS-ECore)
Cisco Certified Specialist – Enterprise Design (CCS-ED)

The best news is that the CCIE Written is now no longer, so I failed that exam three times and don’t have to do it anymore. I can now go direct to the Cisco CCIE LAB!

This is exciting and also scary at the same time. The CCIE Lab will be updated in April and I think it’s going to take me a year to prepare for this exam. I want to master every subject, but some of the new exam modules might be difficult, like Viptela and SDN.

Very difficult to lab at home.

Stay tuned as I ponder how to attack this challenge.

 

~Brad.