The Network is Down? How did that happen?

You’ve designed a multi-path, fault-tolorant network.  You think you’re protected, right?

Not necessarily.

I will sheepishly admit that I got caught a many years ago (before PathSolutions) when I designed a highly reliable network and thought that my network was bulletproof.

I ran a network with three buildings in a campus, each one on its own subnet with links between each building.  I figured if one link went down, the traffic would be carried by the alternate link until we could diagnose and fix the outage.

I set up the monitoring software to ping various devices to make sure that they responded, and if a response failed, it should alert someone.

Things went well for a few months.

One morning, building 1 lost connectivity to the other buildings — a complete outage.  I looked at both building 1 links and discovered that one link lost its connectivity just a few minutes ago, and the alternate link lost its connectivity over 2 weeks ago.

How could this happen?  I thought I had monitoring to make sure this would never happen?

Using PING (ICMP echo) to test network connectivity is fine when you only have one path to a destination.  If that path is lost, you’ll get an alert.

If you have multiple routes to reach a destination, the PINGs will still go through even if the primary (or backup) route is down.  Thus, you’ll only get the notification if BOTH links go down.

This is the point where I learned that you need a better way to track this: Monitor the actual interface status and alert based on its status change.

Embarrassment avoided!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: