The US Trip

Dreams do come true, if you spend enough time and energy on them. I know its always the people that have them come true that pop up on facebook and say “I did it and you can too”, “Never give up”! but you don’t hear about the stories of people that didn’t make it. You don’t hear the stories of I lost all my money at the track, you only hear the stories of I put a mystery bet on and won a fortune!

This story, could of went either way. It isn’t about me, but more about the people that made this dream come true. Behind every dream and conquest is a legion of people that pushed you to excel. When you couldn’t go any further, something or someone reached out and lifted you up.

I recall a story or poem I read once, it was about someone who was walking with the Lord on the beach. As this person walked, parts of their life, good and bad appeared and in each scene when something was bad there were only one set of footprints.

The person was upset, and as they had looked to the lord they said, “You promised if I follow you, you would walk with me always”.

The Lord replied, “That is when I carried you”.

This occurs in our lives, everyday. Either it be the Lord if you are religious or friends and family.

So, what has this got to do with my trip to the US? Well, for as long as I can remember I always wanted to visit the US. I accomplished that dream when I was 21, travelling by myself on a Contiki tour. Many trips followed and I have been to many states and cities in my time. One goal had eluded me, and that was to work there. I didn’t care if it was a conference, a meeting or a onsite rack and stack of a router I wanted to complete this goal.

Last year, I was close to doing something completely out of character. I was close to attending a conference and actually speaking to an audience. This is absolutely crazy to me, as I have had an issue with speaking in public since my first job in the IT industry. Everyone has this issue, but sometimes people’s minds can become hyperactive and fixated on this.

So, that trip did not come to fruition due to various reasons outside of my control, but a last minute call up to a Global Meeting in Tampa took its place.

It was a whirlwind trip of one week, with 24 hours of flying and layovers each way.

Initially I decided a hard no, my mind was racing and imagining an aircraft lost at sea, going to the meeting in my underwear or fight or flight mode decided flight. I spent about 2 hours processing all this, probably for a few days as well.

It was exhausting.

Here I was, the dream of a lifetime and due to a fear of flying and some ancient flight or fight response I was going to say no. I was going to give up my dreams for things that had not happened, that were not real and regret this decision for the rest of my life.

This is when my family and friends came in and pushed me, pushed me as hard as anyone ever has.

In Australian culture, going to the US has always been a big deal. In our news we are bombarded with Aussie actors, actress and musicians that have gone and made it in the ‘US’.

Don’t worry, there is many that haven’t and still trying.

But for someone like me, who works from home for the last two years (thanks Covid) and hasn’t even really been in a meeting room for sometime, was about to be sent to the US to attend a in person meeting with people I have worked with for two years but don’t really know personally.

To me, it was going to feel like my first day.

As the thoughts of disasters and dismay entered my head, my wife came to me and said “If you do not do this, you will regret it for the rest of your life”. “Other people would kill for this opportunity, and your just going to say no because of thoughts inside your own head?”

As I continued to ponder, I reached out to colleagues, friends, family and all the answers and advice was the same. “This is a once in a lifetime opportunity, why wouldn’t you go? This is everything you have ever dreamed of, and the positives of this trip far outweigh the negatives”.

As my own thoughts were reflected back to me, it became clear and just common sense that I have something overactive inside my mind and it has held me back my entire life. Mental health is a serious threat and it was about to take my greatest dream and shatter it to pieces.

The only way to beat it, is to go through it.

So, I said yes.

It was about 10:00 pm in Melbourne on October 22nd (my Birthday) as the new Boeing Dreamliner QF93 climbed into the darkness. This flight number is the same exact flight number I took on my first ever flight and overseas trip when I was 21.

Destination was Los Angeles in 14 hours and 20 mins. There were bumps, I probably snored a little and disturbed my neighbors but I arrived at 6pm on the same day, extending my birthday by quite a few hours. I was even surprised by a Qantas host with a glass a champagne about 4 hours from touchdown for my birthday that I thought was a secret to everyone on the aircraft.

I gathered my bags, checked in for the next flight and left 4 hours later on a United Airlines flight to Tampa, flight time just over 4 hours 20 mins. This flight, was in total darkness, everyone’s window shades were down and I couldn’t see outside. The aircraft rocketed up to 38,000ft as fast as it could due to turbulence at lower altitudes as explained by the captain.

I think I got about 1 – 2 hours sleep before the aircraft started to descend into Tampa, we landed about 30 minutes early and my body was starting to show signs of sleep deprivation and confusion as it thinks it is nighttime, but is actually 5am in the morning.

The sky was dark, a crescent moon hung low on the horizon. The uber was smooth, the highway was wide and the cars were as big as I remember them.

The worst part about arriving at 5am is your hotel check in is 3pm.

Not great for someone who needs some type of sleep to make it to night time and merge into the timezone.

I left my bags and walked to the office as the sun rose, its windows glistened and the signage of PwC was lit up for all to see. The roads were quiet, it was a Sunday and I was all alone on the other side of the planet. I had travelled the furthest I ever had in one day, or should I say an extended day.

I tried to stay up as long as I could, and luckily at 12pm a room opened up. I got about 4 hours sleep, just enough to feel comfortable but not jeopardize the night sleep I needed. I was refreshed and I was ready to meet my boss. I have been speaking with him for over two years, but never met him in person.

It was by chance that when I decided I would take another walk to freshen up I met him in the lobby checking in. He bought me a gift, some strings and a tuner for my guitar as we both enjoyed music and discussed it regularly. It was very nice, I should have bought something from Australia for him, but I was too in my head before I left.

That night I met two other colleagues and had dinner, I was here and adjusting well to the timezone, the food and the drink.

The week was full of meetings and interactions with vendors. I can’t talk too much about what we discussed but I had strengthened relationships and formed new ones with colleagues on the other side of the world. I learnt to understand these people as human beings, not as a name on a chat room or a picture of them in a google meet, I was finally able to know them as a person, I was even invited to a colleague’s house on the Friday night to meet his family and have dinner with them.

I made special bonds with some of them, the type of bonds that you could call them anytime for help and they would be there, and I hope that is what they felt too.

Everything that I had thought about in my mind, didn’t happen. It was ten times better than I even imagined. I was living the dream and completing my goal and I had my family, friends and colleagues to thank for this.

This is why it is so important to speak up. Because, not everyone can do it by themselves, sometimes people need a push, a word or a new point of view.

The trip was over as quick as it was thrust upon me, but I had changed.

I departed on Saturday, once again for a 24 hour marathon. I said goodbye to Tampa and flew direct to Miami for a connecting flight. While in the air, which was only a 37 min flight I lost my grandfather to a long battle of dementia and the limits of the human body. I was able to see him a few days before this trip, although I wasn’t sure if he knew I was there.

From Miami, I had a five hour plus flight to Los Angeles. I had a lot of time to think and process the trip, my grandfather and my life in general. I landed and quickly called my Mum to see how she was after losing her father. I was then once again on another aircraft.

Los Angeles to Brisbane, flight time 14 hours and 20 mins as well. I will never forget this flight, although for part of it I was snoring so loudly the lady next to me moved seats. But I had never seen so many stars in the sky. The sky was dark, the milky way was visible and the constellation Orion was still upside down as I had not passed the equator just yet.

I thought about my grandfather, I wondered if I would see a meteor and then in the corner of my eye I saw a green flash.

It was a meteor.

There were several others throughout the flight, some in succession. Late october and early November is the right time of year for meteor showers, but I like to think it was my Grandfather that had something to do with it. I don’t know where he is now, but I hope he didn’t have any regrets, especially the one I could have had if I didn’t go on this trip.

With a little help I was able to get on that plane and make a dream come true. For some, this is the norm, this is easy and childs play. But for others, it can be a big deal.

Don’t only look at your point of view, think about it from others as well.

For me, this trip like my first overseas trip by myself has changed my life for the better. I know I can fall back on my family, my friends and also the evidence that I went to America and worked for a week.

~Brad.

Automation

It’s the buzzword that sends shivers down my spine. Not because of what it does, and how it can replace me but because programming just isn’t my strong skill.

Its power, is undeniable. More than ever it is needed in the network world and it is already here and in full flight.

See, I was originally going to become a 3d Animation artist. I learnt more graphical apps like Macromedia Director and Flash. I delved into 3D Max, Bryce 3D and used to spend my time applying skins to Star Wars 3D models. I spent time making websites, video editing 21st Birthday party videos and photoshopping.

I gave this up when I read a book on CCNA and found something so technical and so fascinating. The only downside, is it isn’t that creative. Design, can be but you are limited by business drivers, business goals and technical roadblocks. When you build a network, it moves packets but doesn’t give the same sense of adrenaline when watching a movie with special effects, lightning and a soundtrack. It is very difficult to create something out of nothing, but maybe with automation and programming I can?

Now the time has come, to try and learn Python. I have started, but it is a very slow process. I feel like my brain is not wired correctly, because it’s all jargon to me. It might be because I decided to do Maths C instead of Maths A in High School, i’m not sure.

In TAFE, I did basic programming in Microsoft Visual C (i think?). I still have dreams to this day that I am at TAFE and I keep skipping this class and at the end of the year I don’t get my diploma. I need to return the next year for 6 months to do the programming course. This, of course is just a nightmare, I did pass and complete the courses but its this doubt that fuels my anxiety.

So, now at age 40 when I am still trying to perfect my craft of networking and become an expert I am thrown into learning something that will assist my job, but requires a whole new way of thinking, that of someone I never thought I was. It takes me a long time to learn something like this, and that is what I am afraid of.

Wish me luck.

~Brad.

Time Travel

When troubleshooting an issue, without data and clues we have nowhere to start. I was watching the show ‘Emergency’ last night, it showed doctors working in the Emergency Department of the Royal Melbourne Hospital. The emergency doctors, are like operations people for the human body. They are fully trained, in multiple parts of the body. They spent years learning the basics of how the human body works and how to resolve issues. When the patient comes in though, it is an emergency and it means that they need to quickly gather data, find out what happened, where they were, what were they doing and what is the current status of their injuries.

They also need to know, what has happened to them in the past that may impact treatment, or cause any complications when trying to either bring them back from death or stabilize the body.

These people to me are amazing, brave, calm, well trained and well supported by nurses and other doctors around them. If they require help, these ED doctors can escalate to specialists for additional troubleshooting.

It reminds me of a network team, where the body is a network, the nurses and doctors are engineers, the ambulance drivers are the on call engineers and the triage and information gathering is handed by the Service Desk. The specialists may be a vendor or a in-house expert on certain aspects like security, routing, switching and wireless.

I don’t want to compare a doctor to a network engineer, as they are two completely different professions, life or death is in the doctors hands. The qualities, the compassion and the desire to help and save a human life cannot be compared to Brad on the laptop running some pings and helping that Youtube video appear again. But the similarities of having to quickly troubleshoot the issue and remain calm, can be very useful for network engineers. You may work on a network that means life or death to people, it could be a security system with 24 hour monitoring and response, it could be the telecommunications for the ambulance and first responders. It could even be at the blood bank where blood is so crucial to the survival of people in an emergency. No matter what the industry, the ability to gather data to find out what happens is crucial. It’s also equally important to find out what was the state before the issue happened.

Let’s step back a bit now, and look at going to the doctor. They will take an xray, they will capture your vitals, general health & wellbeing. They will compare it over time and eventually use this data when something goes wrong inside your body or they will use it to prevent something going wrong. This is what is important in the network. The ability to see what has happened and what has changed, the ability to prevent issues in the future based on captured vitals of the network.

It’s no secret lately I have been learning the in depths of packet capture analysis. It is a very technical task and I am only a quarter or half way there but it many scenarios in the last year or so a packet capture has been crucial to find the root cause of a network outage or application issue. The downside of these issues is I have always had to take a packet capture and recreate the issue. But what if we could see what was on the network yesterday, or last week? This type of technology I didn’t even know existed until about two years ago.

Instead of plugging in a laptop, mirroring a port for data or doing a capture on a firewall or router interface these devices will sit and capture traffic non stop. It will allow you to go back in time, a time before the issue started. To see what what the state of the network was and what were its vital signs?

The amount of data is huge, so techniques and customization is possible and recommended. You can remove traffic not needed, capture certain VLANs or even physically tap only certain wires and put yourself in the middle of the traffic flow.

Can you imagine, the power and the time saved if you can see what was happening on your network yesterday, last week or even last month? This is powerful, this saves us time, to revive the network, or find the root cause and treat it.

The ability to check the health of a network is also not just from packet captures but other advanced monitoring such as synthetic testing. Synthetic testing allows you to initiate traffic from agents, test all day, every day and compare the before and after of an issue. This level of an agent is like having a engineer constantly testing the network. It is the future of what they called the self healing network. The network that can compare, mitigate and then modify itself. It all starts with data and the ability to look back in time.

Now, can we look forward? Can we anticipate? I think yes, and I have heard of software that is doing that now. It is predicting, analysing and alerting network engineers. This type of technology is beyond my understanding. I myself am trying to get to the level within my own intelligence though, trying to anticipate issues, using experience, comparing data before and after to detect issues quicker and find the root cause. For a computer system to have this machine learning or artificial intelligence, well it needs to be told everything that has ever happened, and everything that could happen.

This blog, for today is about everything that has ever happened. So, reach out to your vendors, ask if they have synthetic testing or packet capture technology.

You won’t be disappointed, and most of all you won’t be frustrated the next time you ask, what was this traffic doing last week? What has changed?

~Brad.

Gimme the cache!

As time goes on, my name may have to change. Ciscoworkerbee is becoming more Networkworkerbee, because anything that has the name network or uses an IP address these days seems to require a network engineer to troubleshoot.

Naturally, when anything is slow the first thing that comes to mind is latency. Everyone else calls it the ‘Network’. Latency is the measurement of delay, delay is measured in the time it takes for something to get from point A to point B. For instance, for a spaceship that travels at the speed of light, it would take about 5 – 8 minutes to get from Earth to the Sun. Although the speed of light is extremely fast, it still takes time to reach a destination when the destination is 149 million km away.

The moon is a lot closer, and a spaceship travelling at 3,280 km/hr takes about 3 days to travel 384,000 km/hr. If you were travelling at the speed of light, it would take about 1.25 seconds.

When we start to look at things closer to home, it takes time for the light to bounce off an object and enter our eyes and then be processed by our brain, it takes time to send your voice over a fibre optic cable underneath the ocean to a listener at the other end.

To measure this in the network world, we like to use the ping tool. The ping tool will send a single packet to a destination and then the destination will reply. It is measured in microseconds and once it replies we get what is known as the round trip time. Round trip time is important when dealing with a Global Network because resources located in different geographical regions will take longer to request and be delivered to the end user. It is therefore more appropriate to have resources local to a user and recently I was exposed to the wonderful world of CDNs, which is a technology designed to bring resources from across the world to a local point of presence. CDN stands for Content Delivery Network and all the big providers like Microsoft, Google etc use them.

Now, this wouldn’t be a networking blog without explaining the symptom, and extreme weirdness of the incident I worked on.

It was an ordinary day in the networking world, although a site in a small geographical location started to experience some applications issues. The last time this worked for this region was 3:00pm the previous day. The application was external and the authentication portal was external as well, protected by a WAF (Web Application Firewall) and these also use CDNs.

The configuration of applications and CDNs is beyond my knowledge, but what I do know is CDNs like to cache information as much as possible. This is usually done for static content and speeds up page loading times. Some CDNs offer advanced caching, so they attempt to cache content that usually isn’t cached.

So, the previous day it was all working fine, and now it is not. The usual questions are asked, has anything changed, can we test on a different machine, test a different user, use incognito in the browser etc. We had no luck in any testing, all we knew was in their region it did not work.

Some colleagues joined, giving a different perspective on the issue. They found the day before, the WAF provider had a regional outage and had to re-route traffic. It was resolved later that day and traffic was restored to the original location.

Was this the cause? How can we test? In the background a colleague of mine decided to force the user to another location and it began to work. So, it proved only the region was affected, not the user or machine. Traffic flowing to a local POP (Point of Presence) where the CDN nodes are is the issue, but how do bypass the CDN or how do we not use it?

I decided to login to the WAF portal and I noticed a tab that said cache. I clicked on it and noticed that it was caching for this portal and also it had advanced caching enabled. The next step I thought was, can we disable the cache? In effect can we go incognito on the CDN? It was allowed, so it was done.

The user tested.

Application responded, authentication completed and issue resolved.

We decided, let’s clear the cache before we re-enable.

So, in the end something in the CDN cache was causing the issue, and it was related to a CDN outage and traffic re-route. Unfortunately, now that we cleared the cache there is no way to know what was in there that was affecting this region. It sounded like something stale was at fault in the cache, and this happens on browsers all the time. You clear cookies or browser history when you experience issues with websites, this was no different.

Once again, issue resolved with the minds of many. It is so important to not try and fix something yourself, getting many eyes and experience into the issue can help find the root cause and resolve.

If you have worked on CDNs and have ever seen this before, would like to hear about it.

~Brad.

Note: Extra points if you can name the movie the title of my blog is from 🙂

Route Aggregation

When dealing with OMP, the rules are very clear in Cisco documentation, It says and I quote –

OMP automatically redistributes the following types of routes:

  • Connected
  • Static
  • OSPF Intra-Area
  • OSPF Inter-Area

To avoid loops and sub-optimal routing, other types require explicit configuration.

Now, what does this have to do with aggregation? For a router to aggregate a route, it must have a candidate route. The candidate can come from many sources and in the case of OMP if it is not one of the above it won’t be aggregated. Aggregation can do two things, it can advertise a summary route or it can add advertise the summary & specific routes. This can be used for manipulation and traffic engineering.

Recently, during an upgrade I was able to see what happens when code doesn’t play by the rules.

Along time ago in a network far far away, a configuration was deployed on SDWAN code 16.12.x and placed into production.

During the build of this site the following subnet was allocated to the site –

10.240.0.0/16.

It was divided and used the best way it could be and SDWAN router became a neighbor of this site, they shared a great relationship and despised the Empire (Huawei? lol). The neighborship was OSPF and the site advertised inter/intra and external routes to the SDWAN router.

The SDWAN router was configured as so –

address-family ipv4 vrf 1
   advertise connected
   advertise static
   advertise aggregate 10.240.0.0/17
   advertise aggregate 10.240.128.0/17

The SDWAN router with its power of forcing networks into the OMP routing table took the following routes, aggregated and then redistributed them all –

O 10.240.20.0/24

O 10.240.21.0/24

O E1 10.240.122.0/24

O E1 10.240.0.0/17

O E1 10.240.128.0/17

All across the galaxy (that is the SDWAN fabric), the routes were known. The aggregate command advertised both the specific prefixes and the aggregates. There the routes stayed , until called for.

Time passed and the SDWAN software became old and past its used by date. It was time for the code to be upgraded.

Little did the SDWAN router know a great disturbance in the fabric was being felt, the code was upgraded and the device rebooted. Once converged the great router once again forced its routes into the great fabric.

All was good…or so was thought.

The number 1 appeared on a instant message app. It was a customer who had felt the great disturbance in the fabric. The route 10.240.128.0/17 was gone….

Darkness appeared in the OMP table, the 10.240.128.0/17 route was now missing and the source machines had nowhere to go. How could of this happened? How could a route go missing when configuration before and after was the same?

The only reason, what if the code wasn’t right in the first place? It took many days and many network engineers to discover what happened, and here is how it happened.

If you look at the configuration, and follow what Cisco documentation says, it is allowing OSPF inter/intra routes by default but it is not allowing external routes.

So, it should have only advertised the following routes and aggregate –

O 10.240.20.0/24

O 10.240.21.0/24

The inter/intra routes satisfied the candidate requirement and advertised the aggregate route 10.240.0.0/17 to OMP. Its origin was ‘aggregate’, but the code also made a mistake. It decided to satisfy the candidate requirement for the other aggregate, even though no inter or intra routes existed inside 10.240.128.0/17.

When the router code was upgraded and the configuration was loaded, the mistake was removed. OSPF external routes cannot be redistributed and there was no candidate route in OMP for 10.240.128.0/17 so the route was removed.

The outage started at that very moment.

Under pressure and not knowing any of this during the outage, a static route was added to the configuration to satisfy the candidate requirement. In hindsight, the following command could have and should be used –

‘advertise ospf external’

Many other theories were created and tested, but only testing the same exact config with two aggregates on the same exact code did the issue appear and we were able to validate.

So, although the documentation says one thing maybe not every single design or situation has been tested. Make sure you understand what the configuration is doing and ensure you clean up configurations as to not create confusion for others.

May the fabric be with you,

~Brad.

The great re-route.

I enjoy writing blogs that explain an issue, detail the troubleshooting, and find a resolution. It doesn’t just showcase the technical side but also the mindset. You don’t have to know every command or every port of every protocol but knowing how to find that information and investigate an issue is critical. Even the great ‘hunch’ or gut feeling can be a great tool, it’s our brain trying to let another part of our body give some input, especially if the brain is overloaded with racing thoughts. Why isn’t this working?

The gut feeling or hunch also comes from many years of experience, so you know it is a valid input that deserves to be checked.

When something doesn’t work as expected, the outputs and symptoms make no sense, my gut says it’s a bug.

I will set the scene, as best I can without a diagram for an issue I worked on recently.

Server A is in Data Centre A, it traverses a firewall and then a SDWAN Cloud (2 x ASRs) to a Server B located in Data Centre B (2 x ASRs). Data Centre B also contains a firewall. The traffic is TCP port 445 (SMB), the mapping of a drive via a script. It has been working until one weekend at Data Centre B a firewall started to drop traffic and was rebooted.

Shortly after the reboot an incident was logged, as the users were trying to map this drive in a change window at the same time the firewall was playing up. I assumed the firewall issue was the cause, updated the ticket and asked the user to test again when possible.

The issue was still there.

The next step was to first check logs of firewalls to see if this traffic is traversing the correct path. I searched for source, destination & port across all firewalls and I did see traffic at Data Centre A, B but also Data Centre C? Data Centre C is another DC that shouldn’t even be in this path? Only one or two packets were seen at Data Centre C, the rest were at DC A & B.

I decided to install psping on the source server, this tool is excellent. Now that Telnet is all but disabled these days, psping allows you to send a TCP ping on a certain port testing the entire OSI stack.

I ran the psping and observed. I made note of the source ports generated, so I was sure what packets were missing.

Every eight or ninth ping would time out, repeatedly. I took a packet capture at DC C and sure enough the missing packets with source ports as noted were there.

How could this be I wondered? How could the network path send one or two packets to an entirely different DC? At first, I looked for some type of policy-based routing issue but found nothing. I also checked the SDWAN at Data Centre A and saw the routing table was correct. So how could two packets be routed the wrong way? I did a packet capture on the SDWAN routers at Data Centre A and found an interesting clue. The packets that were sent the wrong way always went through router #2. So, I decided to concentrate on this device.

Before I go further into the details, I must explain the topology a bit further. Data Centre A is a spoke site of the SDWAN, it has any to any connectivity across the entire fabric. It can connect to any hub or spoke.

Data Centre B is a Hub site. It has any to any connectivity to its various spokes but must use a backbone VPN to reach sites outside of its own hub.

Data Centre C is also a Hub site and operates the same as DC B.

So far, we are seeing one or two packets arriving at DC C, and we cannot work out why they arrive at this site. When they arrive at this site, they enter the network and are lost. Remember DC C is a hub, so to get to DC B (the destination) from DC C it must send it via the backbone, which it tries to and ages out.

So, as the user tries to setup a TCP connection and perform the drive mapping, packets are missing, TCP alerts the source and says I didn’t see these packets please send and once again certain packets are lost and the process continues over and over, with the connection failing.

So, I return to the routing table of Data Centre A, as this is the only decision point that could possibly send traffic somewhere else. The only possible way I thought this could occur, is if the route for the destination was missing because there is a default route from DC A to DC C effectively sending any unknown traffic to hub site DC C.

I check the routing table repeatedly as my psping is running. It never changed. The route was always present, I checked the CEF table, it doesn’t change so how does the traffic end up at DC C? I decided to check the SDWAN tunnel command which shows you what tunnel is being selected –

show sdwan policy tunnel-path vpn XX interface Te0/0/0.152 source-ip 10.x.x.x dest-ip 10.x.x.x protocol 6

Notice, protocol 6 which is TCP and when you enter this command it shows you the tunnel selected. Every time I ran this command it showed me the correct next hop of DC B.

It was at this point I decided I needed more commands, more detail of the operation that occurs inside the router itself. I opened a TAC case for ASR #2 at DC A.

I explained the issue to TAC, sent a diagram and my findings and awaited a response. We did a WebEx and worked on the issue, they first acknowledged that what I was seeing was correct. Two packets were being routed the wrong way, and they confirmed this by doing a special packet capture on the ASR router. It is known as a Datapath Packet Trace and it shows you exactly how the router decides what to do with a packet.

https://www.cisco.com/c/en/us/support/docs/content-networking/adaptive-session-redundancy-asr/117858-technote-asr-00.html

When SDWAN places traffic onto the fabric, it makes a decision regarding the tunnels that are built between sites. This SDWAN router at DC A has two internet transports and two colours. It actually uses both links active/active at both DC A and DC B so it has a possibility of 8 paths to choose from. The command I used above only showed me the current path selected, not all eight. Every time I ran it, it never changed but if you add the ‘all’ keyword at the end it will show you all available paths –

show sdwan policy tunnel-path vpn XX interface Te0/0/0.152 source-ip 10.x.x.x dest-ip 10.x.x.x protocol 6 all

So, I checked this command and the information the router uses I had been looking for appeared. Out of the eight paths available, six were pointing to DC B, and two were pointing to a spoke site. Those two had been programmed incorrectly and this spoke site didn’t even advertise or have a similar IP range as the destination, so it was randomly selected and added to this table.

So how did the traffic end up at DC C if it was sent to a Spoke site? 

The spoke site that has been incorrectly programmed just had a default route to DC C, so as the router uses all paths due to ECMP from top to bottom it cycles through the eight paths, two packets are being sent to a random spoke site, they then follow the default route to DC C and get lost, the remaining packets are sent to the correct destination. This next hop is known in the SDWAN world as a TLOC, it is a System ID of a SDWAN router and used as the next hop for traffic.

So how did this happen? And how do you fix it?

The how was a software bug, which is documented here –

https://quickview.cloudapps.cisco.com/quickview/bug/CSCvw61731

While you wait to apply a permanent fix you may need to apply a workaround and for me it was to reset all SDWAN tunnels. This would cause a small outage and the tunnels to be reprogrammed. Luckily for me, over another weekend this\ prefix and its associated tunnels were reset by the unreliable internet, so it was resolved.

The permanent fix was you guessed it, upgrade of code.

This was a tough issue to locate, but I trusted my gut feeling even though my brain was saying there is no way it would route two packets out of 10 the wrong way every time.

Although this is SDWAN, the tunnel programming is equivalent to a CEF table, where prefixes are programmed into a CEF table so it can be routed once then hardware can do the forwarding. I am sure there may even be a CEF bug out there as well in some Cisco device.

As networks become more advanced, the possibility of software bugs grows as well. So, make sure anytime you are troubleshooting, always go to the bug toolkit of your vendor, enter your code version, and type some keywords of your issue to see if anything matches up. I am not sure if I would have located this myself because I needed TAC to show me the hardware level packet tracing command and confirm my theory, but from now I will be adding the DataPath Packet Trace to my list of tools.

Happy New Year.

~Brad.

Please NoteOpinions expressed are my own.

About 25 years to go….

Recently turned forty, not only does my primary school years feel like they never happened, now my early twenties are going the same way. When you start to talk about your early twenties like you can’t believe they happened, you know you are getting old.

I started learning about networks in my mid twenties, now I am working with appliances and software that can do my job. For example, in the last year I have been really into wireshark and packet captures. I have been watching a lot of training videos and doing a lot of troubleshooting. Recently I have been exposed to appliances that can actually do all this for you. They take raw packet captures and present TCP analytics easily for the user. It can decipher a wireshark within a few seconds and extract the data needed, like MSS size, Delta time between packets and identify both packet loss and application issues.

Granted, it still takes someone to interpret the data, but it does save a lot of time.

So, I will continue to study wireshark and packet captures, but as the title suggests I have (if I am lucky) about 25 years of work left in my life. That is a long time, because I have actually been working since I was 16, so I have already been working for 24 years! In 24 years I have worked for 10 in a fruit shop and the remaining in IT as a Network Engineer.

When I entered the industry, I was deploying Cisco IP Phones on desks and the ‘Cloud’ didn’t even exist. Gigabit interfaces were only being released and I even worked on some old CatOS operating systems. Now, a lot of the network functions inside the ‘Cloud’ have been handed over to software or out of my control. It is almost if I am becoming a caretaker of the network, making sure all the systems are working together, but not actually configuring anything myself.

The biggest gap I see still, is network knowledge and the ability to see the entire landscape. Over and over again we rely on Network Teams to connect the dots and ensure the application gets from point A to B with all its ‘little’ supporting service like say DNS! I would be very surprised if you didn’t have a small group of people, maybe even one person who is the go to person for all things network in your organization. I can guarantee that person has a networking background and no matter how much cost cutting you will do, you will never get the support, service and dedication of that individual or team when you outsource. But, in saying that you can get that service from individuals within the outsourced company, no doubt about that.

We are the fabric, the caretakers and we actually care about the network, and its health. We becomes owners, we become stakeholders and it becomes our baby. We want it to be healthy, we want it to succeed and we sure as hell don’t want people messing with it!

So, twenty five years… I can’t even imagine what will be here in twenty five years. Networks will truly be automated, they will truly be self healing and they will truly be a small group of people still relied on to understand it. The key, is even though they will do all this automatically, the low level protocols are still from the 1980s. TCP & UDP won’t be going anywhere, so studying Wireshark is still a great option.

I actually had a small breakthrough the other day, I have watched videos over and over but I still wasn’t getting it until it finally clicked. Wireshark itself is interpreting the data as well before we do, so you must understand what it is trying to tell you. In my case, I was interpreting something very simple, very wrong.

When analyzing a wireshark I had a black and red packet highlighted to me, which means it is a TCP error. It said, “TCP Previous Segment not captured”. I thought this meant the source in this message reported it hadn’t seen it, but it is actually Wireshark saying I see a gap in the sequence numbers. So packets are missing. Now this could be because wireshark didn’t capture it due to being overwhelmed, or the SPAN port was overwhelmed or the packet was really lost during the capture. So to see if it was packet loss further on down, you should see a DUP ACK and then a retransmission. That retransmission is the sender resending the missing packets or sequence numbers. So it pays to know who is reporting this, and I feel stupid for not even understanding that it is wireshark reporting, not the sender or receiver.

So, at least I got that sorted before I spend the next 25 years or so looking at packet capture files.

~Brad.

Please NoteOpinions expressed are my own.

Turn the radio off.

I am no way trained in mental health and the complexities of the mind, but I am an expert in my own brain and my own thoughts. I mean, I been listening to my brain for almost forty years.

I recently had a discussion with my wife about public speaking. We spoke about a certain moment that can happen and it is revealed to me a key moment you do not want to happen when speaking. This moment, is when you become aware you are public speaking. This is quite possibly the worst thing that can happen as you break your flow and you mind starts to wander and lose track of where it is.

I reflected on this, firstly on my past. I used to play in a band as a guitarist, it was one of the greatest times of my life. I, am terrified of public speaking and being the centre of attention. I don’t even like watching myself on video, especially videos taken many years ago. But when in the band, I was part of a group, I was not alone like we are when we speak in public.

When I was playing a gig once, the crowd was really getting into this song so I decided to step forward and lean out from the stage over the crowd. I suddenly had that thought of ‘I am playing guitar’ instead of being in the flow and groove of the music. My heart rate immediately increased and I felt my hand start to tremble, I was aware that I was doing something that so easily scares me when I think about it before doing it. I quickly retreated (flight response) and took my position once again as a member of the band and rejoined the groove.

That thought, if I had it before the gig, triggers anxiety. The radio turns on in my mind and starts sending me doubts, what if’s and other concoctions that will probably, actually most probably will not happen. The radio gets louder as you feed into it, your body starts to change and the flight or fight response takes over.

I now reflect on this earlier in my career. It basically destroyed me. I was so aware I was public speaking when I went to meetings I could hardly go. If I did go and I started to speak the ‘awareness thought’ which I call it now entered my mind and broke my flow. It was a terrible time and I left my career in networking for a year because of it.

I now reflect on where I have been recently, and I have been so invested in the topics I am presenting and talking about in meetings that the radio has almost been turned off completely. It took a long time to get to this stage, but a great way to start is =

  1. Don’t avoid anything – it breeds anxiety;
  2. Know your topic, practice and know it backwards;
  3. Don’t push yourself too far everytime, take small steps to get to the front of that stage;
  4. If it is too much, seek professional help. Sometimes you can be so deep in anxiety that you cannot escape and you need someone to lead you out.

I still do have the pre-thoughts, the what if’s before the meetings or presentations but I can quickly turn the radio down, because I can identify what they are. Listen to your thoughts, challenge your thoughts and you can change your reality as well.

In closing, and as this is a technical blog I wanted to mention Wireshark. I have been watching some great courses on Pluralsight recently by Christopher Greer. He also has some content on YouTube. I have been deep diving and really wanted to recommend his courses. Analysis of TCP/IP is critical to a network engineers job and although there are many systems out there that can give great analytics and information, you need to understand the fundamentals of this protocol to understand how data is moved from one part of the network to another.

Have a great day and Happy Fathers Day for all the Dads out there in Australia this weekend 🙂

~Brad.

This is not the longest ticket in the world, this is just a tribute.

I can’t give too many details, as this world is full of people that are just looking for a backdoor or a hint of information to try and compromise your digital security.

So, behold. The longest ticket in the world, tribute.

It should have started at Layer 1. It should have moved to Layer 2, but I got cocky. I have been training on Wireshark and I went straight to Layer 3. I was decoding packets, checking MSS, analysing sequence numbers and looking for packet loss. Why is this upload so slow, but download so quick? No one else has this issue, the WAN is good?

I used both Windows and Linux machines, running iPerf. I had so many windows opened I almost needed a third screen. I felt like a hacker in an American movie, like ‘Firewall’ when Harrison Ford’s character wrote an ACL blocking an internet attacker with a private 10.x address…..? Hey, maybe they used source NAT, I don’t know.

No matter what happened, transferring of traffic between a Data Centre and a Cloud environment was painfully slow. SMB is not the best protocol to judge a network by, so I decided iPerf was the best. Sure questions were asked about the internal environment, do you have QoS? Do you have packet loss within the network, there was a unified no, but a lack of evidence. I continued, I wasn’t going to let this one go.

It’s always the network, until proven otherwise.

First, it was software upgrades in the cloud. Followed by support cases with two vendors. It came back clean.

iPerf running overtime, parallel streams and UDP packets with sequence numbers showing me out of order packets. I should have started at Layer 1.

TCP retransmissions, throughput low and DUP acks. It was packet loss, I knew it but where?

I continued, moving iPerf servers to locations around the world, lowering my latency and pushing my window size. I had 50Mbps upload, 500Mbps download.

Packet captures opened, source mac addresses changing, TCP out of order and I should have started at Layer 1.

The customer, they were on the other side of the world. When I woke up I would troubleshoot while they were asleep. I connected, probed and initiated packets between environments. It’s just so slow….why.

I moved to my local area network, and recreated the issue. I had 800Mbits Up/Down……I should have started at Layer 1.

I returned to the scene of this crime, the slowness of the network. It doesn’t matter if the data moves at light speed across the planet, what matters is that spinning wheel, we want it now, we need it now.

Months had passed, slowly I captured the data and put together my case. Network detective at work, I wasn’t going to give up. I analysed and built a lab to check my MTU theories, I adjusted MSS and confirmed traceroutes in and out. I saw no out of order packets, this was the fix. This was the solution.

I ordered a change, I demanded a change. The change was made.

No effect.

Back to the drawing board they say, back to the beginning I go. I should have started at Layer 1.

Months passed again, and it was time to start moving servers within the network. Start to rule out the local LAN, start to see progress.

Small adjustments made with no effect, but alas, I see a new source mac address in my packet capture. Why, when this is HSRP and only one host should be forwarding…its active/passive. But, this is Nexus. HSRP is active/active for the Data Plane, and I think the other DC is forwarding traffic.

I ask for configs, I ask for show commands. This is not even part of my network now, but I want to know why, why is this so slow. I need to know, if I can’t fix this I can’t fix anything.

I wait patiently, I request we connect my iPerf machine direct to router. I want to know once and for all why is this slow.

We need new firewall rules for iPerf, we need machine built and someone to connect it.

We need time.

Days pass.

Then, the weekend comes and they move the routers to a new access switch without my knowledge, it’s ok it’s their network. They move to a new access switch and core network and they test.

They decide let’s see what a transfer does now we are on a new access switch……

It’s……It’s….fixed.

It wasn’t Layer 3, it wasn’t Layer 2…..it was Layer 1. It was the switch itself, the router was connected to a switch that had no right be connected to this router. It couldn’t handle the bursts, it couldn’t handle the iPerf, or the larger transfers. It wasn’t fit for purpose.

It was resolved.

I connect with my client, I access the router, I run my iPerf.

It is resolved, it is over.

The longest ticket in the world is complete.

I should have checked Layer 1.

~Brad.

https://learningnetwork.cisco.com/s/article/osi-model-reference-chart

Please NoteOpinions expressed are my own.

Rise of the Introverts

Many people out in the world have the potential to be professional guitarists, professional actors, professional dancers, the list goes on. Unfortunately for some people the natural talent still is subject to the crushing power and internal dialogue of the mind. The mind is our greatest asset and our greatest enemy. It sometimes acts like a magician, creating illusions and scenarios that may never happen in the real world. Sometimes it’s like a personal trainer, pushing you harder than you can ever go.

If you were in a band, would you be the front man, or the rhythm guitarist in the back? Some people just don’t want to use their talents and prefer to keep them as hobbies and just work in a job they fall into. Some people avoid uncomfortable situations and find an easy way out, they take jobs they are overqualified for and live in the world of comfort and happiness. Some people want success, they want to use their talents and fulfill their potential but they are held back by their own thoughts.

If you continue to avoid the scenarios that bring you distress and anxiety, it only gets worse as you worry everyday about something that hasn’t even happened.

Some people in the corporate world are impacted by this as well. Afraid of public speaking, meetings or social interaction. Some people are even afraid to talk on the phone.

The work from home scenario has changed the dynamic for me and maybe many other introverts. Some of the triggers have not been avoided but they have been shifted. If you don’t like speaking in public, you still have to but now it is in the comfort of your own home. Will this lead to introverts’ hidden talents such as leadership skills, conflict management and negotiation skills start to flourish now that they are in the comfortable cocoon of a home office? No longer are they controlled by the physical flight and fight response when standing in front of management and colleagues, exposing themselves to gazing eyes and our internal thoughts of “Am I good enough?”, “Am I speaking too fast?” or “Do I look like an idiot?”.

Will you or your fellow introverted co workers start to reveal their full potential during the new work from home paradigm? Have you already noticed some usually quiet people starting to speak up?

With this new found confidence, and practice of Zoom calls and Google Meets when we return to the office will the introverts now rise? Will they become what they always were so afraid of becoming? I think yes, because practice makes perfect. When you have a presentation or a meeting, you get prepared, you practice so what better way to practice than from home and live on the internet!

Technology has opened up new ways to communicate, I spend most of my days either on a video chat or a chat room talking, troubleshooting and fixing issues all across the world. I speak with senior leadership, vendors, colleagues, customers all from a simple chat room or video call. Five years ago, I avoided going into a meeting room with so many people. I would look for ways out, or just pray they didn’t ask me to speak out aloud.

The people I speak with daily now have no idea how tall I am, they have know idea how wide I am (LOL) they don’t judge me by what I look like, what I wear or how I stand they judge me but what I say and do. They judge me by my content and that is all an introvert wants. They don’t want to be put up in front of everyone like a statue to admire, they just want to contribute and be the quiet achiever.

Granted some people like the sink or swim approach, but with mental health at the forefront we need to find the triggers and ask each of our employees, if they are willing to share, what they can and cannot do. We then guide them, teach them and help nurture their talents in a safe environment. Not everyone has the stomach for this manufactured and down right unnatural environment we call the corporate world.

In this work from home world, the introverts will rise. Now they can achieve their potential, in a safe environment equipped with the confidence of experience when we finally return face to face.

Last but not least, this is usually a technical blog so if you are working on SDWAN and using Internet transports, make sure your ISP is giving you the best possible path internationally to avoid latency and packet loss. Australia to Japan via the US ain’t the best path :-).

~Brad.

Please NoteOpinions expressed are my own.

This disclaimer informs readers that the views, thoughts, and opinions expressed in the text belong solely to the author, and not necessarily to the author’s employer, organization, committee or other group or individual.