You need to see DPS gear in action. Get a live demo with our engineers.
Download our free Monitoring Fundamentals Tutorial.
An introduction to Monitoring Fundamentals strictly from the perspective of telecom network alarm management.
Have a specific question? Ask our team of expert engineers and get a specific answer!
Sign up for the next DPS Factory Training!
Whether you're new to our equipment or you've used it for years, DPS factory training is the best way to get more from your monitoring.Reserve Your Seat Today
|Previous Page: Previous: Remote Alarm Block||Next Page: Next: Success Story|
The blackout of August 14-15 cut off power to eight American states, one Canadian province, and at least 50 million people. Among the blackout victims were telecom providers in the affected area, who had to scramble to provide constant customer service during the outage. And while 99 percent of customers had service during the blackout, that still means that hundreds of thousands of customers had dead phones.
An event of this magnitude clearly demonstrates that network outages can happen for reasons totally outside the network. Many network outages are caused by factors that network operators cannot control. In recent years there have been a number of network failures linked to outside causes:
These factors don't just affect your quality of service report. They hit your bottom line - and the bottom line of your customers.
For a quick lesson in network crash and revenue burn, one need only look back to the well-publicized August 1999 case of MCI Worldcom. It began about 10 p.m. on August 5, when technicians noticed a high level of congestion on the frame relay network. It evolved into a nightmarish house of cards for both MCI and its customers.
MCI had recently upgraded to a more scalable infrastructure, a move that reportedly caused the initial congestion and led to under-performance and complete network instability for over a week. As efforts to fix the problem repeatedly failed, MCI was forced to shut down the whole system for 24 hours.
The Chicago Board of Trade was one of MCI's 3,000 customers rendered helpless by the outage. The failure disabled the electronic system that governs the board's exchange leading to an estimated loss of some 180,000 trades. At anywhere between $10,000 and $100,000 per trade, the loss of business was significant and tough to calculate.
The same could be said for national truck stop operator TravelCenters of America, another customer for whom the wheels of commerce ground to a halt. In an InternetWeek story published at the time of the meltdown, Bill Bartkus, vice president of information systems for TCA, said he would seek compensation for lost business. "We're not satisfied at all with this," he said. "There has been a serious impact on our business."
Under real-world conditions, 100% reliability is a practical impossibility; outages are bound to occur sometimes, no matter how robust the network. Your customers need to know this, and it should be spelled out in your service level agreement (SLA).
But SLAs have their limits. If an outage happens, your SLA may protect you from legal liability, but it can't protect your from a loss of reputation with your customers.
According to Bill Harris, a senior advisor at the telecom consulting and management company QCI, sticking to the letter of your SLA can do more harm than good. How you handle a service outage can make or break your business reputation.
"It really depends on lot of things," Harris said. "Was the outage covered in the press? Did the service provider immediately come forward and address the issue or let it linger? Did they make good even if they didn't need to?"
Once an outage occurs, for whatever reason, the question often becomes one of duration. A study of headlines coming out of the telecommunications industry reveals one reason why failures take a long time to notice and an even longer time to fix: a scaling back of the workforce. Companies, said Harris, "are reducing headcount across the board."
Some of these reductions are coming in the area of network support with service providers using fewer people to monitor an increasing number of network miles and switches. The logical result of less dependence upon human capital is greater need, according to Harris, for "more robust, better installed and better maintained" alarm reporting and control systems.
For example, a major carrier experienced a major service failure after an 18-hour power outage drained emergency batteries. The real culprit in the service failure was a malfunctioning power alarm.
"Monitoring services were not activated in a database and the carrier did not know what was happening at the switch," Harris said. Had the monitoring system been functioning appropriately, network downtime would have been significantly reduced.
Bob Berry, president and chief executive officer of Fresno, CA-based DPS Telecom, agreed. "Our busiest days are typically Monday mornings when administrators find out the hard way that their network alarms were not adequately monitored. However, the overall expense of network outages is oftentimes underestimated by people." Berry adds, "In addition to the loss of revenue associated with outages, companies are also faced with FCC fines, SLA penalties, customer churn, and even a damaged reputation."
While some outages cannot be avoided, they can certainly be corrected quickly with the proper monitoring system in place. Today's advanced remote site monitoring equipment can notify administrators across the country of an event that is happening or about to happen to their network. Some systems can even notify multiple people and NOCs so that a technician can be deployed to the troubled site while an administrator in another region can reroute traffic through a different path.
Berry adds, "Today's monitoring systems allow you to build rules for derived alarms and notification escalation. For example, if two noncritical events occur at the same time, it could be considered very serious. And alarm escalation lets upper management know when SLA sensitive clients may be affected." In the telecom industry, an ounce of prevention truly is worth a pound of cure.