Automation and Engineering

Alarm Management

Alarm management is another very interesting topic in industry as this is a by-product of failed planning. At some point, analogue systems were replaced by digital systems, specifically process control systems like PLCs and DCSs. Previously, people truly needed to identify if an alarm was necessary, because to install these alarms required hard-wired connections. With the advent of software and increased computational capabilities, an unlimited amount of alarms is now technically possible. These may not even be process alarms, but information regarding poor signals, or diagnostic alarms, deviations alarm, also even warnings within the system.

The increased capabilities meant that these systems were very flexible and creation or modification and even deletion of alarms was very cost effective. However, no one had imagined that engineers would short-cut the process of deciding if an alarm was really necessary. Instead, the thought process was they should alarm as much as possible. However, now we have industries with plants whose running process control systems are flooded with alarms that are never used, or have no meaning whatsoever. They are nothing more than a nuisance for people controlling these systems. Those meant to actually operate these alarms are unable to do so effectively because they have no way to distinguish between (other than experience) good and bad alarms. Good alarms are those which are rational, which actually make sense and lead to operator action. The operator should know what to do or look for when one of these alarms goes off. Bad alarms, are those which really make no difference to the operator, are perhaps nice-to-know, but do not necessarily lead to operator action. These alarms should be avoided and are currently where many plants suffer today. These can be defined as alerts and require no operator intervention. However the decision to put these in place should also be well thought out because they can themselves become a flood.

Alarm Management is a program which many companies need to have to manage the amount of alarms they have. Companies who were forward thinking had these in place way in advance, but lately they become more and more of a legal requirement. The rationalisation of alarms is a challenging and time-consuming dynamic process. Various organisations have created metrics in different parts of the world that plants should strive to achieve if their operators are able to manage. Be it per hour, per day, per shift, per operator, per station, the end goal is the same: to reduce these alarms to a manageable level and to ensure that operators are not overwhelmed by running systems.

What never gets discussed is the life after an alarm rationalisation. Alarm counts reach a stable amount and now what? How do we sustain this status?

There are so many factors that affect the actual alarm count including but not limited to plant load, plant state (transient or steady-state), and process complexity. And of course when you start to go into batch systems, these can become even more complex. Also results vary from operator to operator. A more experienced operator can filter through alarms much quicker than a novice or they can even prevent alarms from coming because they are able to make suitable manoeuvres to keep the process running with surgical precision, well almost.

Once a plant is stable, the key to keeping the alarm count is a rigid MOC process. This means a well thought out workflow for approving and changing alarm parameters. This process falls within the umbrella of the Alarm Manager.

What is the reason that alarm management exists and why is it such a dynamic process? Let’s say a plant takes the initiative to makes sense of their alarms and reduces the amount to a predictive level. Note that this took a serious effort from the Alarm Manager, the Operators, Process and DCS Engineers and commitment from the Plant Manager and ESHQ.

Through implementation of a strong MOC workflow, plants can ensure that we return to the origin of alarms and process control. Alarms indicate something, not just warn, but tell you something needs to be done and within a certain time. Deciding what needs to be done is a matter of experience from those who know the process and those who operate it. I&E is there to implement those changes, meaning to transcribe ideas into actual computer code.Any change, even if temporary should follow a four eyes principle and be well documented and approved.

Several companies will offer tools that will help further optimize automated systems. Even though a system is configured to run by itself, it will still require fine tuning of the parameters to make it even more effective.

This is where Alarm Management software can be of interest and can add much value to an Alarm Management Program. These tools typically generate KPIs with historical data and even identify the locations where one can start to fix alarms. There are several types of phenomena when it comes to excess alarms, and therefore these software suites should address all potential issues and provide flexible filters for users to tailor to their needs.

Alarm Management Software

Alarm Management software uses existing process information from a DCS system to develop KPIs of a running system to create actionable items for operators or engineers. This can be for example a rationalisation process to manage alarms, or so called Alarm Rationalisation within an Alarm Management program. Alarms are identified because they continuously go on and off, or stay on. This is most likely because they are not configured correctly or sometime not even needed.

How does the software do this? Well, most of the messages from a DCS follow specific formats. The challenge is that they are not identical, but can vary, ever so slightly. This means that this information needs to be interpreted somehow, parsed or broken up into individual categories and then interpreted. Based on the origin of the message, the messenger will have to have different abilities to relay the actual meaning of the message. That is where automation engineers learn to be creative, and therefore take the information which is delivered from systems and build rules to interpret it.

You can think of the automation scripts as language translators, and the automation engineer as the committee that properly defines and translates the words. This is easier to explain through an example.

Say the German chancellor decides to write the British Prime Minister a letter regarding Brexit. She will take the information from her mind and put it on paper, properly phrasing it in the German language, following the grammatical rules of her language, using the proper German vernacular to express her ideas or thoughts. There will be a certain way the date is formatted, the addressee addressed, and at the bottom a signature to indicate the authenticity of the message. The message will be packaged in an envelope and then handed to the post to deliver. The post will take the unopened letter and send it over to Britain, where it will be sorted and delivered to the recipient of the letter, in this case being the British Prime Minister. He will open it and read it and interpret it, and maybe even take some action.

However, as any native English speaker who is learning German knows, the letter will be impossible to understand because the British Prime Minister does not speak German. What are his options:

  • Learn German
  • Find a German translator
  • Reply and ask that the language be translated by the German chancellor before it is sent to the PM.

Learning German can take years and the German chancellor does not dispose of all the time in the world, therefore we can conclude that options 1 & 3 are unviable. So, by process of elimination, we can find a translator. This requires an individual whom the PM can trust. It is also important that this individual have a solid knowledge of both the languages and perhaps also political terminology as to remove ambiguity.

How does this all apply to Alarm Management? A DCS, or PLC collects information. For practical purposes let’s focus on a DCS since they are more common. When values deviate outside of an expected range, then an alarm will indicate that an operator needs to do something. How does this happen? A DCS has a controller in which the configuration is loaded. This configuration will tell valves and switches how to behave based on instrumentation that is in the field. Inside the controller configuration is a PID block, a fundamental calculator or processor of sorts that compares the values read by instrumentation in the field and compares it to desired set-points, set by operators, or set by other parts of the automated process. Alarms are there to tell the operator that he should adjust something if he wants to stay within the desired parameters. These can be for quality, economical, or even safety reasons. These alarms are delivered as strings within the DCS and interpreted by the HMI to something readable by the operators, such as a bell with sound, or a yellow or red light, icon, animation, etc…

These strings are basically data in the form of text and is stored in a database. This means all the information stored within a controller can be formatted into text and read by a text editor. In fact, pretty much any software file can be turned into text. If you open up one of these files, you can see all the information about the file there. To someone who does not know anything about automation, this will seem like gibberish. However, an automation engineer can interpret this text data, or strings, and can extract usable information because deep down, most DCS systems are built on the fundamentals of PID controllers and process control and therefore there is a common nomenclature. Think of this like the letter and the alphabet used to write on it delivered to the PM.

This does not mean variables are named the same thing, or that all DCS systems are identical, not at all. Just that these follow similar principles.

Let us return to the letter example between the two countries. We had said that the translator was the best option to ensure that the conveyed message comes across clearly and accurately.

The DCS, or more importantly the Database which stores the configuration is like the German chancellor. She has the information and shares it in the language and format of her choosing. However, she uses a pen and paper, something that the PM is also familiar with.

An ODBC or OLE/DB connection is established between two servers to share this information and this would be like the delivery of the post. The information is transferred in a raw format (original German) and delivered to the correct recipient, meaning another trusted server. Enter the automation engineer. The information in raw format is delivered as a text, meaning a bunch of letters, symbols, and spaces. This message is then parsed by building rules to match different kinds messages seen in the past. This would be like sending the letter to the translation department. Some standards exist, for example the address could be used to determine that a German translator is needed. These rules use regular expressions to parse the data and convert them into different usable variables, like the timestamp, operator station, action taken, alarm status, type of alarm, etc… Once the information is separated into interpretable information, it can be used to make reports, or KPIs.

In the case of the letter, the writing is broken up into sentences, then into words, and then put back together into something that the PM can interpret with minimal to no loss of meaning.

These tools have been developed over many years of experience and creative minds to establish vendor neutral software which can give insights into a process that was unbeknownst to the operators and owners of their technology. This is a very powerful tool!

When we talk about Alarm Management, we mean that we use the tools at our disposal to enforce an alarm philosophy. This means we regularly review that Alarms makes sense (are rational), and that they trigger operator action, because otherwise they should not be there, and are a nuisance. We also use this to limit the amount of distractions for an operator so that he may increase his situational awareness evidently resulting in improved performance and yields.