Modernizing Emergency Shutdown Systems Step 4: Use a Risk Approach

July 22, 2019 admin cybersecurity, Digital transformation, digital transformation of industries, Industrial Internet of Things, Internet of Things, Machine and Process Management, moderization, Modernization Services, Oil and Gas, Risk Reduction, safety instrumentation systems, solid cybersecurity, upgrade your safety system

Credit to Author: Steve Elliott| Date: Mon, 22 Jul 2019 13:00:23 +0000

The Digital Revolution. Industry 4.0. The Industrial Internet of Things. The 4th Industrial Revolution. Call it what you will but today’s fast-accelerating technological evolution has forever changed the business of manufacturing.

In any digital transformation, investment in safety systems is essential. However, the justification for a safety system upgrade is seldom based on a single factor. Many considerations combine to ultimately build a successful case for modernization. I once presented a manager with two choices: Upgrade the legacy safety system or wait until something happens and go out of business. I know that sounds a bit extreme, but it’s not always possible to restart operations after an outage!

This post will look at using a risk-based approach to build an effective business justification and win that all-important approval to successfully secure money and resources for your new safety model.

Reduce Unplanned Outages

The most obvious motivation to upgrade your safety system is when the plant is experiencing multiple unexpected outages or trips in a year. In that case, the justification is easy! However, just because it hasn’t failed yet doesn’t mean that it won’t fail. It’s all a question of probability / likelihood, or “When, not if.”

If you wait until something fails, it then becomes urgent, and vital resources need to be diverted from existing projects or budgets to resolve the issue. This often results in higher overall costs resulting from a compressed replacement schedule, paying a premium to expedite spares / repairs, and obtaining the knowledge and expertise required at a moment’s notice.

The biggest impact of any unplanned outage is the impact of business interruption. The entire production planning and scheduling can be thrown off balance, supply contract commitments can be placed in jeopardy, off-specification product may be produced that has to be written off, production targets can be missed, and the list continues.

Let the Risk Matrix be Your Friend

I have always struggled to provide a simple visual of the many factors associated with modernizing any safety system. One method that I have used often with good success is the Risk Matrix, as it is a common denominator across all operating companies (although many use different sizes, 4×4, 5×5, 8×8) but they all know and understand the matrix. So, if you ignore the math behind the matrix (it’s hard to calculate the likelihood, but the consequences remain the same!), then the risk matrix provides a simple, easy to understand, common view around which to have the discussion.

Another approach is to justify the upgrade in terms of Risk Reduction achieved per $1 spent:

In other words, if I compare competing project funding requests, I am more likely to be successful if I can prove / demonstrate that the upgrade project provides greater risk reduction to the business for each dollar spent than other competing projects. Without proper planning, you may miss the opportunity to take advantage of the improved functionality with the new safety system that may lead to higher ongoing cost of ownership.

TIP: It may be worth factoring in “What if” scenarios into the ROI calculation to demonstrate the business consequence. If such an event has actually happened, then real numbers can be ascertained and used to support the ROI. If not, then some “What if” scenarios may prove useful in raising awareness and showing the cascading effect it has on the business. To most organizations, customer satisfaction is a key metric, and often has management’s attention. Anything that has the potential to impact that metric often gets priority!

Consider Parts Obsolescence

Obsolescence is not a reason to upgrade a safety system, but the consequence is! Product obsolescence is a fact of life for any electronics manufacturer. Equipment, controllers, communications, power supplies, I/O cards, and workstations get old. Certain components are no longer available. It becomes impossible to repair or replace items.

There are several ways in which this situation can be managed:

Hold adequate spares and replacements locally on site.
Work with the automation vendor to provide a central “bonded” stock for the sole use of the operating company.
Scour the commercial market for spares.
Plan an upgrade path.

While holding inventory locally onsite is often preferred, it does lead to additional inventory cost on the books of the business. This often means duplication of inventory across multiple units, sites or assets and needs careful management for various hardware and firmware revisions. Holding a central stock can also create additional issues when operating across multiple countries, regions or geographies, such as taxes and duties, import / export regulations, and time to get the stock from the central location to the operating site.

CAUTION: In recent years there has been a steady increase in the rise of providers offering used / spares / refurbished modules. This source should be treated with extreme caution. Craigslist and eBay are unreliable suppliers for manufacturing facilities that operate 24 hours a day, 7 days a week.

For any electronic or programmable system, the devil is often in the detail. The specific compatibility of hardware, software and firmware revisions are critical to the integrity and operation of the safety system. At the end of the day, would you trust the safety of your people, production and profit on an “internet purchase” from an unknown source of supply?

The preferred option is to plan an upgrade path to prolong the operating life of the safety system. Upgrades are often “gradual,” and parts of the system are upgraded as / when the time or opportunity presents itself. Key to the success of this approach is to ensure the interoperability of the different versions of the systems. As systems are upgraded, this approach creates “spares” that can be used to support the other legacy systems until they can be upgraded.

Once you start mixing components of different versions, it generally becomes more complex to manage and maintain the various systems and maintenance costs may increase. The key is to get all the systems to a common revision level or, even better, into a position whereby they can be upgraded online without halting operations.

TIP: When considering the ROI calculation, it’s worth including any rising costs of obsolete spares and rising support costs. If there are any planned expansions, it may be worth including the delta between the increased costs of adding points to the existing safety systems versus using a new system. Ensure that the cost per I/O point includes the additional cost of specialist knowledge and expertise required to modify an aging system.

TIP: It is important to consider the availability of parts, as well as the lead time of getting critical / urgent spares. Every day or week that it takes to deliver a spare part can have a significant consequence on the overall schedule or budget, often many times greater than the actual cost of the replacement part itself.

However, any of these options does not resolve the fact that the equipment has reached the end of its useful lifetime and presents a business risk and should be planned for accordingly.

Protect Against Emerging Cybersecurity Threats

Any modernization plans must include a solid cybersecurity program. Not only is this now an integral part of the latest industry standards (e.g., IEC61511 Edition 2 now mandates cybersecurity risk assessments) but many organizations now include mitigating / manage risks of cyber-attack in their company standards.

Older machines using operating systems that are no longer supported present a security risk because they are susceptible to virus and cannot be made secure. Loss of these machines can be critical if they are not available when called upon, leading to potential downtime. As part of the business justification, it is worth capturing the risk and cost of potential downtime due to the inability to access the relevant engineering and maintenance machine(s).

Beware of the “Custom Special”

In legacy safety systems, the programming tools available at the time they were produced often didn’t support templates or comprehensive function block libraries. This made it difficult to implement standardization across multiple systems, sites or applications, leading to the implementation of complex code, that is often unsupportable, unmaintainable, and understood by the very few.

Many of the latest programming tools support standardization, allowing knowledge and best practices to be captured and encapsulated. Modern code reduces customization and simplifies troubleshooting for both engineers and maintenance technicians, which can greatly reduce downtime. This is difficult to quantify for an ROI calculation, but it may be worth including realistic examples or case studies applicable to your operation as part of the supporting documentation.

Rip or Replace?

Very often the first question I often get asked is “Can we upgrade what we already have, or do we need to rip and replace?” For me, the biggest risk in a replacement scenario has to do with the physical space, especially the equipment cabinets. The impact of replacing, removing and decommissioning existing cabinets and then replacing with new must be carefully considered.

Start with the footprint and space available, as this often dictates if rip and replace is even possible.

Also consider that, at some point, rip and replace will get you to the point of no return! Don’t forget the impact on people (e.g. the news skills and competencies required, additional training required, any new training systems or simulators, etc.) And finally, don’t forget to include the impact on existing support contract(s).

If you decide to upgrade existing, make you sure that you remember to update your spares holding to ensure compatibility with the old / new versions of the upgraded system.

Example: A major chemical manufacturing company embarked on a program to upgrade or replace existing emergency shutdown systems. The decision was made to upgrade the existing system as it was ~85% cheaper than replacing with new, only required one week of plant shutdown compared to six weeks if it was replaced and was deemed the lowest overall risk solution (for example, no revalidation of the application was required).

Consider System Architecture

If you do decide to go down the route of replacing old with new, take a moment to stop and consider what your future business needs are and what they’ll require from your safety system. The good news is that there are now more choices available on the market. The bad news is that there are more choices available and you need to decide what is best for you.

For example, new network architectures and communications protocols mean that you can architect your safety system by functional plant unit, put I/O in the field in predesigned field enclosures, use Universal I/O to reduce the quantity of spares required, accommodate late changes, install early and then configure later, and the list goes on.

In general, ensure that you fully understand what you are buying, and what your obligation is for the operating life of the asset (lowest CAPEX doesn’t necessarily mean lowest OPEX). For example:

Ensure you understand the Total Cost of Ownership for the remaining operating life of the plant.
Understand how often you must proof test the system, calibrate the system, etc.
Understand what diagnostics are automatic / built into the system versus what must be configured in the application logic.
Know how the system redundancy works (voted / adaptive fault management versus failover redundancy).
Determine if online changes or modifications can be made to the safety system without halting operations. Are there any manual precautions required during download changes?
Understand failure modes, including what happens to DCS communications during high CPU loading / extended scan time.
Ask how many faults the system can tolerate before shutting down.
Learn if there is any application logic required to replace I/O modules.

There is a lot to consider when assessing risk. These are some considerations and tools you can use to examine your own situation. In the next post, we will take a closer look at Step 5 – Defining the ROI.

Did you miss the first posts in this series? Click below to read them now.

Modernizing Emergency Shutdown Systems: 7 steps to approval

Modernizing Emergency Shutdown Systems: Step 1 Funding Approach

Modernizing Emergency Shutdown Systems: Step 2 Build Consensus

Modernizing Emergency Shutdown Systems: Step 3 Decide When

For further reading:

Download IIoT for Process Safety White Paper