British Airways IT outage highlights need for power monitoring and mitigation

Credit to Author: Markus Hirschbold| Date: Fri, 16 Jun 2017 12:00:25 +0000

I’m sure by now that most everyone is aware of the major IT outage at British Airways. As reported by The Telegraph soon after the event, the outage “led to hundreds of flights being cancelled or delayed and left an estimated 75,000 passengers stranded.” More recently, additional details were revealed which, in turn, have posed more questions. A Bloomberg article reported “An engineer had disconnected a power supply at a data center near London’s Heathrow airport, causing a surge that resulted in major damage when it was reconnected … estimated to have cost as much as 100 million euros ($112 million).”

Further, Willie Walsh, CEO of parent company IAG, stated that the resumption of power was done in an “uncontrolled, uncommanded fashion,” which is what led to the damage to the servers. It is still unclear why the site’s uninterruptible power system (UPS) failed in protecting the servers and helping them ride through the event.

At the time of writing the investigation is still ongoing; however, this incident highlights just how important electrical power continuity is to every organization. Outages are always bad for business, often racking up millions of dollars of losses for data centers, financial institutions, industrial plants, and other kinds of operations. In this case, the reputation of British Airways and the valuation of IAG are also at stake.

For those who want to learn from BA’s painful lesson, it’s worth considering how power management technology and services can reduce such risks for an organization. Advanced capabilities are now available to help operations teams stay on top of conditions, avoid potential downtime, and respond to critical events faster and more effectively.

First, the only way to prevent problems is to make sure you’re watching every important point in your power distribution system, 24 hours a day. Intelligent power monitoring devices should be located at every major node, from service entrance to servers. These can be standalone meters, or embedded intelligence within other kinds of equipment such as smart circuit breakers. Each monitoring point measures, monitors, and logs hundreds of power and energy parameters, and alarms on specific types of risk conditions. Each device shares this information over wireless, Ethernet, or other types of connections to upstream local or cloud-based apps. These apps enable technicians to analyze the state of the electrical distribution network, including deviations from normal operating conditions.

A system of devices, communication connections and hubs, and networked software can, necessarily, become quite extensive. Fortunately, new services are available that use special analytics to validate if a power monitoring network is properly configured and feeding the appropriate data from each point. These services typically also offer consultation on any discovered conditions that pose immediate risks to reliability.

If a power-related issue occurs, the right people are notified on their mobile devices before a problem has a chance to disrupt any critical equipment. Often, a power-related problem can be resolved quickly, and downtime avoided.

If an outage is unavoidable, advanced root cause analysis isolates the source of the problem. This could include a power quality issue, breaker trip, failed power transfer, motor overload, transformer failure, etc. Power event analysis, sequence-of-events, disturbance direction detection, and power quality analysis are all valuable capabilities in this situation. A power management application can also provide guidance to help technicians determine the appropriate response to manage the impact of the problem and help restore power as quickly as possible.

To prevent reoccurrences, a power management system can determine the impact of acute and chronic power system events. It will correlate data from the meter level to the system level, showing the origin of an event and how it propagated through the power distribution system. The operations team can then take steps to mitigate these conditions and stop them from disrupting or damaging critical loads and equipment in future.

Mitigation usually takes the form of power conditioning equipment. Mission critical operations, such as data centers and hospitals, will always have a UPS onsite. The AC-to-DC-to-AC voltage conversion of an UPS will inherently filter out a range of power anomalies. However, separate voltage regulation, surge suppression, harmonic filtering, or power factor correction systems can also be employed, if necessary.

Schneider Electric has helped thousands of critical-power operations keep running by avoiding power-related downtime. Our EcoStruxure (TM) Power architecture includes a complete array of intelligent power management devices, apps, and services. We also offer a complete family of power conditioning solutions. Click here to learn more.

The post British Airways IT outage highlights need for power monitoring and mitigation appeared first on Schneider Electric Blog.

http://blog.schneider-electric.com/feed/