2019

This will be my last post as a Cisco Champion for 2019. I enjoyed being part of the program, although due to the time difference with the US I missed out on a lot of stuff. Most of the presentations were in the middle of the night and as a father of two young kids, I couldn’t make it work.

Hopefully in a couple more years I can re-join, or maybe one day I could fulfil my dream or working at Cisco itself! My original goal when I started in the Networking field was to work as a Network Engineer in New York. I have been to the United States a few times on holidays and always wanted to return for work someday.

At this stage, I would be happy to even attend a conference in the US, as my family commitments have changed since I have a wife and two children and, moving them all to the US isn’t in the best interests of the family unit.

This brings me to next year’s goals.

Currently I am pondering the following two fields –

I have been a ‘Network Engineer’ for many years, and I really wanted to achieve the role of Senior Network Engineer. I think this would be a great introduction into achieving the long term of goal of a Network Architect. Even to work within a group of architects would be enough for me now. Unfortunately, there isn’t one where I work. We have Enterprise Architect and Cloud Solutions Architects that work at mostly Layer 7 and 8 (People).

Every Network Role I have had, I have managed firewalls. Firewall Rules & VPNs mostly. I have always enjoyed security and what it stands for. It is very black and white, either its secure or not. Either you trust it, or you don’t. I like these boundaries and it helps create process and procedures, which creates accountability and what is right and wrong.

Security is exploding, it’s the next field that we don’t have enough people in, and I think would be very welcoming to a person with a networking background.

I will continue to blog as well, hopefully with more teaching topics. I really do enjoy learning something new and then telling others about it. I like to share my information and want others to succeed.

I haven’t switched off just yet for 2019, I am on call a little over the break. But when I do get a chance, I will ponder next year’s goals and resolutions.

Interestingly, in Feb 2020 my certifications which I just renewed will change as Cisco have revamped their entire certification stream. The CCDA & CCDP is being absorbed, so I won’t have as many. I will gain the new CCNA & CCNP certifications though.

I don’t think it is quantity though, it’s all about quality!

Merry Xmas & Happy New Year!

~Brad.

 

 

 

 

 

CCNP/DP Recertified – now what?

Ok, another recertification out of the way. It will be another 3 years before I need to do it again, but things are changing.

The new CCIE, is very different to the one that I wanted to get when I first started in Networking. I was pretty keen on attempting the CCIE written again next year, as I have failed it twice so far. Now I have no idea what to study and where to begin.

What was Routing & Switching CCIE has now disappeared it seems. So, I cannot re-sit that written exam again and have a case of third time lucky.

The streams are now –

  • CCIE Enterprise Infrastructure
  • CCIE Enterprise Wireless
  • CCIE Data Center
  • CCIE Security
  • CCIE Service Provider
  • CCIE Collaboration

I have already removed Collaboration, I don’t like Voice and Video deployments. I like working on the networking, the QoS, and the Multicast if needed but I don’t like administering the backend and working on phone systems.

I love Service Provider, but I don’t work for one anymore and it’s a limited space in Brisbane. It is my favourite of all the CCIE, but I don’t have enough experience in the core of a Service Provider. They are heavily automated now as well. Usually when you go to work for a SP, you get pimped out as a consultant and selling stuff. Sales, is not me either.  Data Centre is awesome as well, but most companies are sending to the cloud…which is someone else’s Data Center! I don’t see many cloud companies in Brisbane either!

I have to be really careful in what I select, I want to do something I have had the most experience in. Although my experience has been so wide spread lately. One minute I am deploying Multicast, the next I have to start learning Viptela SD-WAN CLI to help cutover a site, or lab a Palo Alto IPSec Tunnel.

The two topics that have been constant in every workplace is Wireless & Security. I have deployed Wireless in both Mesh, Standalone & Controller based setups. I have never had formal training on firewalls but I have touched ASA, FWSM, Palo Alto, Juniper and Checkpoint.

Security and Wireless, is not going anywhere. That I am certain of, as we advance our digital footprint, there is someone that is following those footprints…waiting for the right moment to exploit or take your information.

We are all mobile, so wireless is here for good. It’s always been here since the invention of the television, and as we talk to our distance spacecraft that land on Mars or our automated vehicle, we can’t have a cable attached.

Same kind of goes for Routing & Switching. We must route IPv6, existing IPv4 and send traffic the quickest path within the carrier networks. Switching is changing, spanning tree is disappearing but still the same at the enterprise edge for now.

The biggest change for me, is automation and programmability. I don’t know about you but I find coding incredibly boring. I can’t be creative with it like some other things I like to enjoy. The typing, the logic it’s just one giant maths equation to me.

But, we must change to grow so I will begin with this. It is something new and maybe once I get the basics, I could be creative. The problem is I have to start at the beginning again. The programming I did in school, I don’t even know how I passed. I still have dreams to this day that I haven’t finished my programming assignments and fail the class and my course.

It seems that Python is the way to go for networking, so that’s where I will start. It also seems to be all through the new CCIE exams, so that’s a step in the right direction.

And  when I say start from the beginning I mean, the beginning.

sys.exit(~Brad.)

 

 

 

 

 

 

 

 

 

 

Cisco CLI Analyzer

Have you used this? If not…download it now.

Before you log a TAC case and be told to update the software (90% of the time lol) log onto your Cisco equipment with the CLI analyzer.

It gives you a powerful set of Cisco TAC tools right at your fingertips. It also helps with understanding commands, best practices, bugs and saving configurations.

https://cway.cisco.com/docs/cisco-cli-analyzer/2.0/New_Features.htm

In the words of Maury Finkle, founder of Finkle Fixtures, Biggest Lighting Fixture Chain in the Southland, do it.

Do it.

~Brad.

 

Passed on what you have learned….

The one problem we have in the IT world is documentation. No one seems to like doing it. Unless you specifically pay for a design document and as-built, you are going to get a document that usually has a lot of cut & paste from official vendor documentation.

Some of the design decisions were not captured, the implementation may have been rushed or not scoped properly and this leads to a ‘just make it work’ scenario.

But at what cost? The poor support team? The customer that paid top dollar and didn’t get what they paid for?

I guarantee that when you get a tradesperson in to build you a kitchen or a house, you would be checking every corner, every wall to make sure it is what you paid for. Why does this not occur as much as it should in the IT world? Or maybe upper management believe it is happening, but at the coal face it is not.

For me, this isn’t right.

Until we have a system that will scan the network and automatically build an as built document (once I learn python, I will try!) we are stuck with people retaining information in their heads and not documenting designs.

This seems to occur more in the enterprise space as well, due to external vendors being used and scopes not including such vigorous documentation.

Some believe this slows down the process and the build as technology must move at the speed of business these days. This is all achievable if you just make a few extra steps at the start of the project.

Steps like a clear scope of works, measurable and provable outcomes from each technology and then allow time in the budget and project for these to occur.

So, my philosophy is as follows and really should be common sense and the basics of an operational network. Its based on a document I encountered a few years ago. I have made it high level, for easy reading.

1. Customer provides a clear and concise document outlining what they require.

2. Customer provides a document that highlights each technology, what they expect and the expected performance of this technology.

3. Project team prepares a POC or high level design document, POCs can be dangerous, by the time it works the customer expects it to work the next day and suddenly you have a POC GONE LIVE.

4. Project team stages design, tests design and documents.

5. Customer checks design and documentation.

6. Build begins, with testing documentation. Each section built, tested to specifications and then signed off.

7. Build completed, as built created from design. Highlights any design changes during the deployment. The real world may have introduced these depending on how close your testing environment was.

8. Diagrams and support documentation added to as built.

9. Project completed, and changes going forward to be added to as built and version updated.

This along with the following –

1. Transparency to customer and management

2. Handover session with support

Is a great start in building a new or upgrading an existing network.

It’s not the end of the project that matters or the completion that is the most critical part of ‘getting it done’ it’s the beginning.

This is where all those problems that will be coming can be removed.

~Brad.

Subnet Help

I had a colleague recently come to me to ask if he could shadow me at work, as he is studying a Masters in Cyber Security and needed to get his network skills up.

At first, I was kind of nervous. Then I was kind of honoured that someone saw me as a possible expert to assist. I have always wanted to be a teacher in the later stages of my career so this was good practice for me.

I stewed on it for a bit, as he told me that he was having trouble with subnetting.

Subnetting, is not my strongest subject. I have never been fast with numbers in my head. I am more or a visual/creative type person, so doing mathematics in my head has never been easy. I usually use a subnet calculator, and very much so with IPv6.

When I enter a Cisco Exam I always write the following on the paper they give me before I start –

128 64 32 16 8 4 2 1      (For Binary to Decimal Conversion)

/30 – 255.255.255.252

/29 – 255.255.255.248

/28 – 255.255.255.240

/27 – 255.255.255.224

/26 – 255.255.255.192

/25 – 255.255.255.128

I use the above for Wildcard Mask checks, so a /28 is 256 – 240 (last octet) = 16 then -1 = 15, so wild card is 0.0.0.15

A = 10 B = 11 C = 12 D = 13 E = 14 F = 15 ( For Hex to Decimal Conversion)

He explained to me that he is aware of the IP Classes, Class A, B etc and also the basics on binary when it comes to an IP address and 32 bit dotted decimal numbers.

I gave him the following small one on one introduction and I am hoping this might help someone else.

I first started to make it relevant for him and asked for his machines IP address – 

IP – 10.168.138.64

Mask – 255.255.252.0

GW – 10.168.136.1

I then converted to binary – 

IP       – 00001010.10101000.10001010.01000000

Mask – 11111111.11111111.11111100.00000000

GW    – 00001010.10101000.10001000.00000000

I then showed him each bit in the octet, represents a decimal number  – 

128       64       32        16        8        4        2       1

0            0          0           0        1        0        1       0    = Add the 1’s = 8 + 2 = 10

1             0         1           0        1        0        0       0   = Add the 1’s = 128 + 32 + 8 = 168

1             0         0           0        1        0        1       0   = Add the 1’s = 128 + 8 + 2 = 138

0             1         0           0        0        0        0       0   = Add the 1’s = 64

Hence, his IP is 10.168.138.64.

I then showed him how the subnet mask is used – 

By checking the 1’s in the subnet mask, it will reveal the network portion of the subnet

Bold is Network Portion, which = 10.168.136.

Italic is Host Portion = 138.64

IP       – 00001010.10101000.10001010.01000000

Mask – 11111111.11111111.11111100.00000000

Adding it all together, his host IP is 10.168.138.64 with mask 255.255.252.0

Remember that the network portion will not change, so there is two bits remaining in the third octet here –

10001010 

As you may have guessed, if they both ones then it would be 3 in decimal. If you add 136 + 3 you get 139 and no more in that octet.

So, the only possibilities are* =

00001010.10101000.10001000.00000000 = 10.168.136.0

to

00001010.10101000.10001011.11111111 = 10.168.139.255

* You can’t use the first and last address, so it’s – 10.168.136.1 – to 10.168.139.254. 

Add that all up and you have 1024 hosts, minus two non-useable and you get 1022.

I also calculated the bit mask – 

Add up how many 1’s in the mask when in binary

11111111.11111111.11111100.00000000 = 22 1’s

So, that becomes your /22

10.168.136.0/22

I did this, to show him how a network device works out what the host section of the IP is and the network section. The network section doesn’t change, it is usually assigned to a VLAN or a Layer 3 interface. The VLAN is a Virtual LAN, which contains a subnet. Inside this subnet is hosts. Hosts can be machines, cameras or phones. Anything that wants to talk on the network.

I also explained, that the network 10.0.0.0/8 is a private class A address. It is designed to be used inside a network and it cannot be routed on the public Internet. During the network design phase, the network engineers divided this 10.0.0.0/8 network into multiple subnets to be deployed throughout the organization.

They possibly assigned a 10.168.136.0/24 network, and then realised we need more hosts. The only way to get more hosts is to start taking bits from the network portion of the address.

The /24 mask is –

255.255.255.0

11111111.11111111.11111111.00000000

For a /22 we take two of the network bits, and make them hosts –

11111111.11111111.11111100.00000000

This is the essence of VLSM, Variable Length Subnet Masking. Taking what was once a Class A address – 10.0.0.0/8 and dividing into a Class C Subnet – 10.168.136.0/24 and then borrowing some bits (for keeps) to create 10.168.136.0/22.

I have now asked him to go forth and read up on subnetting, using this as an introduction. I am sure he will have many questions, but this is the only way I thought of trying to explain it initially.

I finished the lesson, explaining that the host when searching for a destination will use its subnet mask to work out if the host is on the same network or not.

So his machine, 10.168.138.64 wanted to talk to 10.168.136.75. His machine will look at his subnet mask and the IP and through binary calculation will determine that it is indeed on his local network i.e i the same subnet.

His machine will send an ARP message to get the Physical Hardware address (MAC) of this host, and then it will encapsulate the IP packet into a frame, add the destination MAC address, convert it to bits and deposit it onto the wire.

What if he wants to go to 8.8.8.8? He does the same thing, but this time he knows it is not on his subnet after calculating and will send it to his default gateway. The default gateway will then take care of sending this packet to its destination.

But that, is another blog…and so is IPv6 Subnetting.

IPv4 address – 10.168.136.64/22 = 1022 hosts
IPv6 address – 2000:1234:5678:9ABC:1DF:5678:9ABC:1111/64 = 18446744073709551616 hosts

~Brad.

 

 

 

 

 

 

 

 

Technical Leadership

I been reading a bit of opinion posts on networking forums lately. Some questions like how can we fix the IT world, how do we convince our management the importance of listening to the tech guys? Does management know what I do day to day? Do they really understand when I say, if this device fails and you have no support its going to cost the business a lot of money?

I don’t know much about managing people, I am not a manager, I have had no formal management training. I understand that management has more than technical goals to achieve, they have business goals and usually it is all restricted by the all mighty dollar.

I am currently a tech guy.

What I do know, is one of the best bosses I have had was technical.

Tasks and projects came in. He didn’t just assign to us in our resourcing tool. He would let us all know the new projects and offer to anyone interested first. He also encouraged us to take on new projects to further our skills if we had never touched the technology.

If we had no experience in this new project, he would make sure we had all the requirements and would even prepare a small training meeting to give us the basics of the technology to get ourselves started.

He was hands on when things got tough, to help us all complete an important task or troubleshoot a critical issue.

When review time came, he detailed what we did well, what we didn’t quite meet and how to approach ways to improve and succeed the next chance we got.

I guess in a nutshell he ‘communicated’ and he made sure we trusted him and he trusted us. There was respect, discipline and there was also accountability. He stood up when things went wrong to his managers, he didn’t pass the buck. When a failure occurred, if someone messed up it wasn’t our name served up on a platter, it was his name as the Team Leader. Just as in the armed forces, the leader is responsible for his men/women.

He was also honest, and in the end deceiving or withholding information from your employees will only push them away. We are adults and we are all here to do a job, collect our paycheck and further our careers but first and foremost there is a job to be done and let’s do it to the best way we can.

He also let us do our thing, with respect and trust he did not need to micromanage. He believed in us and our abilities and was there as a mentor and manager when we needed him.

As my career continues, as younger people move up and start entering the industry I can’t stay a tech guy at the frontline forever. I would love to, as I love helping people and fixing issues.

If I decide to move into design or a specialisation, I have a good technical background from the frontlines. If I decide to move into management, I will be exactly like that manager I had.

It’s a well-known saying, people don’t leave companies, they leave bosses.

And I wasn’t going anywhere…

~Brad.

Where did you come from and where are you going?

I’m currently studying to recertify my Cisco exams, and it got me thinking. If everyone has these certifications what sets me apart from other people. Anyone can read the content, memorise the required information (UDP port 123 is NTP, Default Redistribution Metric for OSPF is 20 & E2 route) do some labs and then attempt the exam.

It takes a lot of work to pass these exams, don’t get me wrong. But when going for a new role what sets you apart?

For me, it’s the unique way you got to where you are. The way you learnt how to be a worker, the way you studied, your foot in the door and how you are growing.

I can explain how I got here and how I work. This is what sets me apart from other people when I go to an interview. This is not about who is better, just like when I tell people I play guitar. I have a lot of guitarists saying “Oh, you’re probably better than me” but that’s not what it is about. It’s the interpretation of the instrument, with all my experience that makes me different to everyone else. Everyone uses the same six strings, but what people create from those six strings is very different.

I started my working life at a fruit shop. It was a real physical job and I was subjected to a very strict working environment. You could not stand around and do nothing, anytime a job was completed, and you thought there was nothing to do, you grab a broom and start sweeping.

The first skill to learn when you start work, is initiative. Don’t wait to be asked to get the bin or pick up a broom. Look around and find something to do. When you are at work, you are paid to do a job, usually by the hour.

The second skill is being punctual. When you start at 8:00am, that doesn’t mean you start at 8:03, it means 8:00am. So, be on time and come back from your lunch break on time as well.

The third skill is working with a team. When team members are ill, or need a hand then use initiative and help them out. Don’t go out on your own and be a hero. You have a team that supports you and you must support them as well.

I worked at this fruit shop for 10 years as a casual employee. I worked all the way to basically being second in charge, I was unloading trucks at 4am in the morning, I was opening the store and I was deciding on what we needed to do during the day. This was an excellent way to start my working life and prepare me for the future.

Now, when you finish your education be prepared to do anything! I went to TAFE at Box Hill and in my final semester we were offered work experience. This was my foot in the door, and I knew it. I was really hoping to land it at Telstra, but I was offered one week at a company called Netstar (now known as Logicalis).

The week before, one of my colleagues at TAFE was supposed to go to Netstar but he couldn’t make it. Netstar called me up and asked if I wanted to do two weeks as they had an IP Telephony rollout, they were doing at various TAFE institutions throughout Victoria and needed the help. I jumped at the chance and was able to complete two weeks of work experience.

On the very last day, the manager walked up to me with a Netstar T-shirt and asked if I wanted a part time job. I said yes!

The role was mostly deploying switch configurations, installing phones and UPS. It was my foot in the door and with my work experience at the fruit shop I was able to be on time, always working and keen to help.

This sets you apart, and is not a skill, but a mindset. It is your attitude.

I worked at Netstar for some time, moving from part time to a full-time casual role but I suffered a moment of panic and performance anxiety in my second year that really affected me. It really set me back and I eventually left the role and joined one of our customers instead. I was looking to not work in the client facing arena anymore. I wanted to work on one network and learn it backwards, I didn’t want to be thrown in front of customers over and over. I needed some experience from the customer point of view.

My next job was at the Australian Red Cross Blood Service and this was my first real Network Team job. I prefer to work in a team, I like the altogether mentality, I like to share the workload and the responsibility. If one of us fails, we all fail and pull ourselves up again. I spent four years in this team, learnt a lot about networking and people in general. Unfortunately, the ghosts of the past returned in my fourth year and I decided to leave the job and the industry.

I then moved to Brisbane and spent a year of living in my own mind. It’s not the place you want to be, it’s the course that everyone is now being certified in, it’s the new celebrity confession that makes them ‘one of us’, it’s called a mental health issue.

After one year of this, and some help I was able to crawl out and face my fears slowly. I started with a 6-week contract at a local IT company in Brisbane. I was installing Access Points at schools across South East Queensland.

I was starting my new foot in the door. After 3 weeks, I realized that I needed more than this, I needed to get back to where I was so I could push forward in my career. I needed to go back to a Network Team.

I joined Rio Tinto as a contractor. It is the best place I worked so far. I would have stayed, but outsourcing was in, so I was out.

I half assed looked for a new job, but the demons were starting to tap my on the shoulder, so I listened and took the Christmas off.

I was still determined though and found another job, this time at one of the best technical teams I have ever worked for at Brennan IT. They had a division known as Brennan Voice & Data and this place made me realise that Enterprise Networking is not Networking. This place was its own Internet Service provider and I was thrown into a world of MPLS, VRFs and BGP galore.

I loved it and I learnt a lot here. I was again in a Network Team.

After two years, the industry changed, and I found myself being sent more to consulting jobs than being in the Network Team. They started sending me to places to be customer facing and I just wasn’t having none of that.

I left.

I found myself, going backwards now. I went to a Level 2 Network Team, to protect myself at a company called RCST. This company supported Mining Operations in Queensland and I was doing up and down monitoring and realising that all my dreams were slowly fading away.

Within 6 months, I was bored. I had to do something, I had to try and rise. With some help I pushed myself and got promoted to the Network Engineering Team. I was going to be client facing, but that was ok. I had learnt some skills with my help, and I was prepared.

I remember the first time I was sent to site as the ‘Network Engineer’. I walked in with my head held high (partially due to an issue with my vision) and looked out the 21 first floor window, knowing that this was my chance to either succeed or crumble.

When the meeting was over, I walked out having conquered my fear. I was client facing, we discussed QoS designs and other networking topics. Even when I left RCST, the architect sent me an email saying thanks for your work and thanks for caring about our network. That made a difference to me, for the first time a customer looked at me, a consultant as one of them. Not a salesperson, or a vendor just looking for more work but someone that cared about the environment.

I left RCST, but for the right reasons this time. This time I wanted to go to a big company, like Rio Tinto but be a Full-Time employee.

I have ended up at PwC as a Network Engineer in the AU Network Team. As I am a current employee anything I say can and will be used against me in a court of law, so sorry no details 🙂

Now, the next 10 years of my career are going to be the most important. Just as a doctor or a dentist wants to specialise in certain type of medicine or procedure, I want to specialise in a certain part of the IT industry.

I will take all my experience, all my ups and downs and even the lessons learnt when sweeping the floor to achieve my next goal.

The only hard part now is, trying to decide what to specialise in…

The year off I had, the industry changed so much. Within that one year, AWS and Azure were becoming the go to platforms and the Cisco Routers that I started seeing in Data Centres, I had never seen before.

With the birth of the SDN and the cloud, the networking components are now buttons and tick boxes on a website portal. Only if you work for the provider will you touch the actual network. If you are at an Enterprise, the BAU functions are being automated and you are starting to either work on projects or move into management.

So, the challenge for the remainder of the year is pass my exam, recertify for another three years and then really work out where am I going. Will I try and become a Network Architect, will I try and make a small leap forward and make Senior Network Engineer or will I move into a specialization like Security, Wireless or hell IPv6, can’t be many experts yet in that?

No matter what, nothing will happen until I put that first foot forward.

By the way, if this is my son or daughter reading this in 15 years’ time and you’re at work, you obviously have enough spare time to pick up a broom and start sweeping…

~Brad.

 

 

 

 

Brad’s OSPF Study Notes

I have decided to document my notes as I study on my blog. I take in information better when I watch the training video/s and take notes. Hope this helps anyone else studying as well. I need to re-certify all my Cisco Certifications so I have picked CCNP Route because its good to refresh this when not doing a lot of routing at work and also the exam will change in Feb 2020. I will have another 3 years to try and figure out how to attempt the CCIE or maybe move to a different specialisation.

The fundamentals of networking are still very important, even as we begin to automate tasks you still need to understand how a routing protocol works inside and out.

I am already halfway through Chris Bryants CCNP OSPF Fundamentals Route Videos, so will kick off now with his OSPF Stub Areas. Stub Areas in OSPF are useful when having routers that don’t need an entire OSPF database or they don’t have the resource capacity for a full table due to the amount of external LSAs in the network.

LSA Types –

LSA 1 – Router Link States – Each Interface attached to the router for every area

LSA 2 – Sent by DRs Only

LSA 3 – Summary LSA – Generated by ABRs only – Describe Inter Area Routes

LSA 4 – Describe Path to ASBR – Generated by ABRs only

LSA 5- External to OSPF (Redistribution) – Only Generated by ASBR

LSA 7 – Only seen in NSSA

Redistribution –

Command – redistribute connected subnets

  • Metric 20
  • E2 by Default
  • DOES NOT INCLUDE LOCAL METRIC TO ASBR

Stub Areas –

Stub – No Type 5 LSA

Total Stub – No Type 3,4 & 5 – Applied on ABR (no-summary command)

NSSA – No type 5 allowed. LSA Type 7 only from an ASBR, converted to LSA type 5 towards backbone. No default route.

Totally NSSA – Default injected via Type 3. 

~Brad.

 

 

 

 

 

 

Multicast for me, Multicast for you. You have a cast, I’ll have one too.(Updated)

Unicast, I want it but no one else does.

Broadcast, I want to let everyone know!

Multicast, Tune into me if you want to!

During my studies and work in the Mining Industry I was heavily exposed to Multicast. It was one of the dark arts, it sounded complex and every time I heard multicast I shuttered. Eventually, I started to forget about the theory until recently when I was working on an AV deployment.

I had to return to the roots of multicast again and as I started to recap it started to flood back, and the best part was I had a brand new site to test it on. It wasn’t due to go live for awhile, although it should have been designed and planned before the site was even thought of….but that’s IT for you.

Using IPTV systems on a local LAN and single VLAN, multicast just works, usually….. Try and send it across a Layer 3 router and the multicast ride begins.

I won’t go into extreme technical detail or the basics of Multicast here (please see the Cisco Community Post here – https://community.cisco.com/t5/routing/multicast-routing-and-iptv-question/td-p/3892589 ) as this post is more the exact thought process and the troubleshooting style I used. It is a real word example of what you don’t learn in a book, its from experience and collaboration with colleagues, Cisco communities and the vendor.

I encourage you to review and document cases such as this, it helps reflect on how things came to be and also can be used to decide on what different steps you could have used to find a resolution quicker in hindsight.

The scenario – We had a working solution in a site, contained in one VLAN. Multiple Multicast Sources and Receivers and IGMP snooping enabled. We now require the same source streams to pass over some layer 3 links and be sent to receivers that request it.

Usually when a multicast hits a switch it is treated as a broadcast and flooded out all ports. When IGMP snooping is enabled, the switch listens in and builds a table to only flood multicast packets the ports that require them. This is also the root cause of the issue we faced. The bug in the software stopped traffic being sent to receivers that wanted to listen to 239.255.255.255 (SAP).

How did we get to this root cause? The long way, although it did help me understand multicast even more than I already did.

Initially, and something that should always be done,  I consulted the configuration guide for Multicast on the Cisco 3850 series switches. The configuration required to make this work is as follows –

  1. Static RP, which is the meeting point for Multicast Senders & Multicast Receivers. It’s name means exactly that, (RP = Rendezvous Point);
  2. Each Layer 3 router needs to know who the RP is when using PIM-Sparse Mode which is what we have configured (Spare-Mode builds a tree outwards to the receivers, unlike Dense mode which floods everywhere and prunes back when discovered no one is listening);
  3. IGMP snooping enabled (By Default);
  4. ACLs for both allowing PIM routers to register to RP and also limit the scope of what this deployment will manage.

The source/s are located at Site – Epsilon and the receiver/s are located at Site – Alpha. Epsilon is working fine, the TV streams are being multicast and the channel list provided by SAP (239.255.255.255) is being announced.

PIM Sparse mode has been enabled, the RP at Epsilon has been configured and all Layer 3 links between Epsilon & Alpha are ready.

When we first activated this, all channels being sent from Epsilon made their way over to Alpha. TVs could see the video stream and when pushing the channel list button the channels appeared. After maybe 5 to 10 mins the TV streams still worked but the channel list did not. Our TVs did not know the Multicast addresses of any channels anymore.

The RP also had a message in its logs.

So troubleshooting began, and I started with what I could see and that was the log message in the RP.

Received Register from (IPTV receiver VLAN) for (IPTV Receiver IP, 239.255.255.255), not willing to be RP.

It took some reading and posts on Cisco Community to discover the cause of this message. The message means – The RP does not know it should accept PIM registration messages from this Source and Group. I had already deployed my ACL to lock down the groups my PIM routers should forward traffic for and I had already tripled checked 239.255.255.255 was there.

Indeed, it was –

Access-list 10
Permit 239.192.0.0 0.1.255.255
Permit 239.255.255.255

We looked at other reasons why the traffic would not pass from site to site, TTL was one of them. Multicast traffic is usually sent with a TTL of 1, so it will not pass over any router. The source must be configured with a greater TTL, and it was, it was 7.

The Alpha router showed entries for the S,G (pronounced ‘S Comma G’) so the multicast routing table was aware of this traffic at group 239.255.255.255. Just because the routing table is aware of these routes it does not mean the actual stream is there. Using the ‘show ip mroute active’ command will show multicast streams and bandwidth for each stream.

This stream was not present.

I returned to the RP device and decided to check the config with a show run | i ‘pim’ and a little piece of left over config revealed the cause of the log message. A PIM register ACL had been applied. This is not the ACL 10 from which was applied to PIM routers above but a RP ACL applied to the RP itself to tell the RP who to accept registers from. I had to make sure that Alpha was allowed to register to RP as my devices also send announcements to SAP address – 239.255.255.255

This solved the log message error. Although my channels were still not working. Turns out is had nothing to do with the channels, it was just a by product of all hosts in the IPTV network regardless of site send announcements to 239.255.255.255, so they are both source and receiver.

So how did the RP and the other PIM routers have the S,G entry? The RP was configured to allow its local VLAN to register, part of some existing configuration.

The next step and one that is so needed in all troubleshooting today is a packet capture. I had to find out if the actual stream destined to 239.255.255.255 was actually getting to the site Alpha.

I captured on the 3850 devices itself using the Embedded Packet Capture feature, and found the SAP messages coming in from the WAN towards site Alpha. From this I knew multicast routing was working as expected. I also saw IGMP messages coming from the source devices requesting the stream. This confirmed that the source is requesting via IGMP and the PIM deployment is successfully routing the traffic towards the site.

I also did some multicast packet debugs to try and see what the switch was doing with the traffic. I also did IGMP debugs which showed register messages, but I never saw the actual traffic itself. I never saw UDP traffic with destination IP 239.255.255.255 entering the receivers VLAN.

So, why does the stream for IPTV work (multiple 239.x.x.x addresses) and the 239.255.255.255 not?

I had searched for PIM bugs and IGMP but didn’t locate anything. I spoke with the vendor and asked if others had deployed such a design. I read and read the same theory over and over. This problem consumed me at night, I was logging in after hours and running show commands to revalidate my work.

It was at this point I engaged Cisco TAC for further assistance. I needed some expert level help. But what I really needed was new eyes, fresh eyes to look at this from a different angle.

It should be noted that, I was unable to connect a host directly to site Alphas Router (3850 Fibre switch) to rule it out and I did not reload the devices. Something that is usually done when you have a computer issue! This would later be revealed that if I had I would have seen the traffic work for a moment and then drop off again just as we saw when first configured. This information was key as it reveals the final step of the multicast traffic process to be at fault. The switch itself forwards the multicast traffic to the IGMP hosts that request it. The switch uses IGMP snooping to record where these hosts are.

I then had a baby. Not me but my wife.

So I left work for leave and left it in the hands of my network colleagues.

They continued to work with Cisco TAC and performed additional troubleshooting.

I received a message a week later that the source of this issue had been found. They had found a IGMP bug that I didn’t find when searching and it related to 3850s forwarding traffic to IGMP hosts. They disabled IGMP snooping on the receiver VLAN at Alpha and the stream appeared. It is now being broadcast to all hosts in this VLAN as IGMP snooping is disabled.

Some questions still remain, maybe they can be answered with further input from Cisco as to why it only affected the address 239.255.255.255? All other vide streams worked! Would a detailed mpacket capture reveal the root cause? Many unanswered questions still remain. When I return to work I will organise some downtime to capture mpacket (or mfib) debugs and IGMP snooping.

At this stage, the workaround is actually the fix because there is no plan for Cisco to fix this bug – https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvn14836/?rfs=iqvred

It should be noted that our site is 2 x 3850 SFP switches as a Core and Stacked 3850 as layer with Ether-channels.

The key lessons learnt and guidance for the future is –

  1. When possible reload the device!!
  2. Capture Packets everywhere!!!
  3. Confirm traffic flow, confirm devices adhere to protocols and process when forwarding traffic!
  4. Enlist help for fresh eyes and a different perspective!
  5. Multicast is all about the source so you must trace backwards. Ensure the receiver  is requesting the stream and then intercept traffic at each Layer 3 router towards the source;
  6. Multicast routers will not forward traffic if the does not pass a RPF check, traffic from the source must be received from the same interface it is learnt from, otherwise it will be dropped;
  7. Multicast is like listening to the radio, hosts will request to listen to a multicast address, the network infrastructure will ensure only the hosts that want to listen will get it, no one else will;
  8. Never give up!
  9. Never stop checking the theory, things break for a reason. It may be unknown when first occurring but there is a reason it breaks, there is more to the story than the word BUG.
  10. Deploy best practices, why wouldn’t you deploy what has been tested and deployed already?
  11. Design, Design & Design! When introducing a new technology it must be tested and designed with ample time to ensure it is working correctly. Allow this time in your projects and budgets. It will save you, the business, the engineers and everyone involved time and money.

Cisco TAC UPDATE – 

They found the root cause as this bug – 

CSCvn31653.

Until next time,

~Brad.

 

Resources used during troubleshooting – 

Routing TCP/IP – Cisco Press

Cisco Community – Man by the name of Giuseppe Larosa was extremely helpful.

Cisco Multicast Routing for CCNA, CCNP, and CCIE Candidates – Kevin Wallace – YouTube

 

 

 

 

 

 

 

 

 

 

 

The little wireless network card that could…

Ever had one of those issues you just can’t let go? I was presented with quite the issue recently and although it only impacted one laptop out of literally thousands, I just couldn’t let it go without finding the root cause.

Unfortunately, this blog doesn’t end well. The root cause is still unknown, if anything I think the card is haunted. Either this blog entry will fascinate you, frustrate you or a small neuron in your brain will fire and you will come up with a root cause for me 🙂

This issue, I was told has happened on a few occasions only in a specific office. The Wireless Card in question is the Intel Dual Band Wireless-AC 8260. It is running on a Cisco Wireless Network, provided by the 5520 Controller running 8.5.140.0 code.

Here are some of the symptoms and observations encountered at the machine level –

  1. Wireless disconnects out of the blue;
  2. Laptop will not connect to this wireless network anymore;
  3. If you remove this card from the motherboard and put it in another laptop the same problems occur;
  4. If you then take this Laptop to another site, now using a different wireless controller and IP space it works;
  5. Return to the problem office and it repeats again;
  6. Assign a static IP to the card, it works!
  7. Remove the static and use DHCP and it doesn’t work anymore;
  8. Cleared Winsock and TCP/IP settings;
  9. Could not update drivers due to company policy, but we have thousands of laptops with this card and driver.

Here are some of the observations I found when doing a deep dive troubleshooting session –

  1. Using debug client <mac address> on the WLC, Auth is working
  2. Checking Backend Radius, Auth is good
  3. Debugs reveal client is added to exclusion list, and blacklisted for 60 seconds, reason for blacklist is ‘Identity Theft’.

I did some googling, asked Cisco Champions and also posted on Support Community for some assistance. Mac spoofing, bugs and other issues have been suggested so the next step was to get a packet capture on the laptop itself.

Well, the actual problem of ‘Identity theft’ appeared very quickly, but to understand it you need a little more information.

  1. SSID has an interface with this IP on WLC – 10.10.100.5/24;
  2. DHCP server for this network has the range of – 10.10.100.5 – 254/24 (What!, giving out a used IP!)
  3. The local switch interface provides the DHCP helper services.

During the capture, the first packet received is a DHCP Request Packet. No DHCP discover is sent, implying the machine already knows the IP it wants. The IP it wants is 10.10.100.5/24, the IP of the SSID interface.

The packet capture then shows no more DHCP messages, but the client does an ARP request to see if the IP is in use, and as expected a duplicate IP address message is received.

As of this moment, in the local switch, its ARP cache should have 10.10.100.5/24 resolving to the WLC interface (if it has been active on the network, which according to Cisco is only active if DHCP proxy is enabled, and in our case it is not). 

Next the client, basically steals the IP by doing a Gratuitous ARP and from this moment on the switch now has the mac address of the laptop in its ARP cache and the WLC has blacklisted the client due to ‘Identity Theft’.

So, after all this I came up with a plan and it is as follows –

  1. Rectify the DHCP scope and remove the IP from the lease pool – 10.10.100.5/24;
  2. Add the DHCP servers to the SSID interface (even though DHCP proxy isn’t enabled but to keep inline with other site configurations).

I completed all this and came in the following Monday to turn on the laptop, very confident that the issue was resolved.

I powered the laptop up and exactly the same thing occurred! It requests the 10.10.100.5 IP again, pollutes the ARP cache of switch and then does not connect to the Wireless. I even disconnected the Wireless Card from the laptop and then plugged back in and it still requests the IP 10.10.100.5!

It must be noted that this wireless card was taken from a laptop, put in a new laptop and re-imaged and still requests the same IP!

Hence, why I think it is haunted.

At this stage, questions in my mind have been raised –

  1. Is the network card firmware corrupted?
  2. Does the network card misinterpret the DHCP packets?
  3. Why does it request the first IP of the range (something to do with seeing this IP in a DHCP packet of the WLC)?
  4. Is any hardware level malware present?
  5. Am I crazy?

My current situation is as follows –

  1. I believe the card is faulty and the IP of the SSID interface should not be in the scope;
  2. Going to probably send this wireless card back to manufacturer;
  3. Waiting for this to occur again on a different laptop, although as IP is now not available it should never be given out.
  4. Look forward to seeing if this occurs again, as it has something to do with the Wireless Card, the MAC it has, the WLC interface and the IP assignment process.
  5. I have removed the ability to get this IP, so if a new client does pick it, it means it really is misinterpreting the packets from the WLC.

 

So, an exhausting but very educational troubleshooting session at this stage. Have learnt a lot regarding the WLC and the DHCP process.

If anything comes to mind, let me know!

As they say at the end of Back to the Future, to be continued!

 

~Brad.