Best Practices For Backup And Recovery

Contingency planning - having a Plan B in case disaster strikes - is essential to making network monitoring truly effective.

An independent study of FCC outage reports found that nearly half of all reported outages are due to human error:

FCC outage reports
An independent study on FCC outage reports found that nearly half of all reported outages are due to human error.

While people cause the greatest frequency of outages, it turns out that natural disasters and network overloads are responsible for 62% of customer downtime.

That means over half of all network outages are caused by factors out of your control.

Accepting this fact is the first step toward building a more reliable network. The next step is taking action to mitigate the effects of disasters.

There's a reason all cars come with spare tires.

If you've ever had a flat tire, you can certainly appreciate why your car came equipped with a spare tire. And even if you've been lucky enough to never have a flat, you probably still appreciate the value of having a spare in your trunk. How comfortable would you be driving without a spare tire?

It's obvious that having a backup unit is essential, especially in network monitoring. AT&T and other leading RBOCs make it a policy to keep spare units on hand, at various locations, just in case something unavoidable happens. These companies provide services to millions of people, including 911 services, government contracts, and other business critical services. The major RBOCs know that network downtime will mean lost clients, lost revenue, and, in the case of 911 services, possibly even lost lives.

Does your network design include backup systems? Have you identified the critical, single-point-of-failure segments of your network and planned accordingly? All it takes is one lightning storm or flood to destroy equipment that could have lasted for years. Do you have backup systems and spare parts ready to go at a moment's notice? As remarkable as overnight delivery is, do you really want to wait 24 hours-at best-plus installation time, for a spare unit to come online? You must protect your network by having spare units in stock.

Dual power supplies are great... if your network elements can handle them
Many companies have realized the value of having a backup power supply. But what if your equipment only has one power input? And what if that is the power supply that fails?

The best practice is to always buy equipment that has dual power inputs (A/B power feeds) and can automatically switch to alternate source power, so that if one power source fails, your equipment is never affected.

Geodiversity and redundant systems

Some companies take their backup plans a step further by having multiple master stations collect alarms in different parts of the country. This principle is called geodiversity. But geodiversity only works if all your master stations are synchronized with each other.

There are many ways to synchronize masters. For example, a master-slave relationship synchronizes network element configuration data. If the master should fail, a slave station immediately takes over. There is also passive polling, in which a second station taps into the data steam of the first, creating a live backup, which is used in networks with copper-wire transport.

Dual masters need dual responders

If your redundancy plan is based on dual SNMP managers, make sure that your remote monitoring devices can send traps to multiple SNMP managers. Many remotes do not support more than one SNMP manager. Even if you are currently using only one SNMP manager, it's still a good idea to make sure that your remotes can report to multiple managers, because you may want to implement a backup master as your network grows.

Multiple notification methods

Redundancy means more than just having backup units. A best practice that is often overlooked is the need for backup notification and transport methods.

Dial-Up Backup: Most network administrators prefer using a LAN/WAN transport layer, because it is faster and cheaper than a dial-up connection. But what if your LAN goes down? Would you lose visibility of your network sites? Your LAN needs a backup, too, and the solution is to select a remote monitoring device that can report alarms both over LAN and Public Switch Telephone Network (PSTN).

Multiple Device Paging: Another often-overlooked necessity for a robust redundancy plan is multiple-personnel notification, which in effect gives you backup repair staff. If only one person is notified of a network fault, and that is person is delayed by traffic, illness, or any other unforeseen event, the fault will not be repaired. You can increase the odds of someone responding to a critical event by selecting a system that can notify multiple people using multiple methods. At the very least, make sure your remote alarm monitoring device can send a message to the master station and one other person or device-the more the better. Here are some notification methods to look for: