We discuss power generation take aways from the 737 Max case for Industrial Control Systems (ICS)
By Bill Ray and Craig Nicholson
In our last blog, Who Checks the Checkers, we discussed how complacency can result in lack of oversight and misstatement of risk. This shortcoming is greatly mitigated through the use of third parties to bring fresh insight and process check and balance. In this blog, we dig deeper into power generation industrial controls and the need to build on the many successful years of following prudent industry practices.
Combustion Turbines across industries
Combustion turbines are ubiquitous in multiple industries such as aerospace, marine, oil and gas and power generation.Turbine technology has been utilized in industrial applications since the late thirties and will continue to evolve and serve industry going forward. The core technology across various industries is essentially the same; however, their applications may differ significantly. Given varying risk profiles of applications, such as engines that sustain flight versus those that don’t, there are key differences in how such industries are regulated. The most highly regulated industry from a technological perspective is aviation, which is regulated through the Federal Aviation Administration (FAA).
What happened in Aviation?
Likely you have heard about the two fatal crashes involving the Boeing 737 Max that took 346 lives in indonesia October last year, and Ethiopia this past March. The investigation is ongoing; however, the evidence points to a similar failure in both events; an aircraft stability feature that automatically drives the nose of the airplane down to avoid a stall condition. This system is called the Maneuvering Characteristics Augmentation System (MCAS). Only an engineer could have named such a system.
Getting to the core of it
For those turbine controls experts who have followed this story, I’m sure you questioned the prudency of design when details emerged. Essentially a single sensor called the ‘angle of attack’ sensor, of which there are two such sensors located left and right of the front of the aircraft, malfunctioned causing the flight control system to erroneously determine the plane was in stall position and push the nose of the plane repeatedly and, ultimately, fatally down. Ultimately, a single sensor had the capability to cause a catastrophic event. With hindsight it’s much easier to identify and place fault in the system, furthermore, the 737 Max flights could have flown for years without experiencing such an event, either through good fortune or more robust instrumentation. This brings us to prudent industry practices.
Prudent industry Practice
Prudent practice in the sphere of controls ensures protection of human life and equipment with appropriate safety systems as a priority while maintaining necessary system availability or other key process metrics. The foundation of prudent industry practice lies in the application of precedence, knowledge and experience from similar applications. For example, it is known that humans make mistakes and components malfunction. A key insight for control system designers is to not only test for working features but to also rigorously test for failures. Thorough acceptance testing ensures robustness and completeness of design and fitness for purpose. Did anybody ask during MCAS simulation tests, what happens if this sensor fails?
“A key insight for control system designers is to not only test for working features but to also rigorously test for failures“
Safety by design: Measurement Conditioning
An example of engineering used to increase safety is the practice of ‘measurement conditioning’ of signals into a processing unit to ensure the desired output. The ability to deal correctly with erroneous signals is a best practice in any safety critical application. Typically, for any safety critical system, ‘redundancy’ is built into the measurement chain. In other words, more than one measurement source for a given input. For instance with two sensors you can compare inputs and provide resulting information to the operator (or pilot). A minor deviation or spread may produce an alarm, giving the operator (or pilot) time to follow predetermined corrective procedures for such alarms. Further deviation may cause additional annunciation (higher level alarms) and or the execution of fail safe logic. Fail safe logic could consist of discarding one input (take the high or low), discarding both signals and substituting another method of measurement such as a predetermined value or calculated value from other field measurements or disabling, isolating or shutting down the system. In the MCAS case, fail safe would likely be to disable the system unless an alternative set of conditioned logical inputs could be used to deterministically detect a stall condition (perhaps a calculated threshold based on plane speed, height and engine thrust. Note: this is for illustrative purposes and not a recommendation!
MCAS had redundancy and was regulated by the FAA, what went wrong?
According to the Wall Street Journal, over the years the Federal Aviation Administration has increasingly relied on the Original Equipment Manufacturer (OEM) for final design sign off in order to free up government resources. Additionally, Boeing did not categorize the MCAS system into a safety critical category that would require additional oversight. The system was designed to rely on one sensor and did not include any sensor discrepancy annunciation. Furthermore, it appears the pilots who operated the aircrafts were not adequately trained or informed of the system – such as how to quickly diagnose and disable MCAS in the event of a malfunction.
Bringing this to Power Generation (the engines that don’t fly)
Standards and Regulation:
There are several regulatory bodies or checks and balances in power generation that relate to controls. Oversight can be found in a cross section of councils, professional licenses and associations such as NERC reliability and cyber security, Nuclear Reliability Council (specific to nuclear technology), Professional Engineering (PE) approvals (enforced more rigorously in a few states such as California), and Canadian Standards Association (CSA) certification, but there is little in terms of regulation or technical oversight when it comes to implementing control system upgrades on power generators.
There are ISO/IEEE/IEC standards for hardware/software design, communication protocols and safety, such as Safety and Integrity Level (SIL) certification for emergency shutdown systems. Without an industry wide standard and regulatory oversight, implementation of prudent standards, for the most part, are left to the supplier. As a result, there are many custom built systems with varying levels of quality or inherent prudent industry practice (built to standards).
Mitigate Owner’s Concerns
This is a legitimate concern for any asset owner. Poor quality designs can have a detrimental effect on power plant reliability and performance such as excessive alarms obscuring real process issues. Poor signal conditioning & logic design or a lack of inherent safety measures can potentially lead to catastrophic equipment failure or, even worse, injury to personnel. Quality, reliability and safety are the result of internal or external independent oversight guiding the project through its life cycle of purchase, design, acceptance testing, installation and commissioning.
Fast moving industry
Given the lack of regulatory hurdles to introduce a new system to market, the industry moves fast to stay ahead of the competition. Controls vendors make money by selling their latest system in the form of upgrades while phasing out older systems and continually increasing legacy system prices. The typical lifecycle of a system is ten to fifteen years. As a result, control systems will suffer from obsolescence and increasing reliability issues well before many other components on the power plant. Such strategies of stocking critical spares can delay an upgrade; however, If your operations team find themselves buying spares off the grey market or ebay, it’s time to upgrade.
Industrial Control Evolution
We’ve rapidly moved from mechanical hydraulic systems, through analog into digital based systems. About a decade ago, programmable logic controllers were introduced as turbine control systems, now we have networked PC based systems, windows computers and IT infrastructures. New threats exist as modern systems reside on local and wide area networks with the requirement to conform to cyber security regulation and patch management to protect against cyber intruders infiltrating power plant systems.
Diligence lost in change
Given the pace of change and the various systems from multiple manufacturers it is critical that prudent industry practices are not overlooked in new system hardware, software and safety design. Traditional practices such as; separation of protection and control functions, inclusion of inherent system redundancies, two out of three input voting and hardening critical systems with ‘fail safe’, redundant and hardwired devices can be designed away in the spirit of saving implementation and hardware costs.
Prudent industry practice is based on precedent, knowledge and past experience. If we are not diligent, prudency can get lost in the change. If a regulated, slow moving, process driven industry like aviation can overlook a single point failure to devastating effect, the risks are significantly multiplied in the much less regulated, faster moving power generation environment.