This week’s Pipeliners Podcast episode features Doug Rothenberg returning to the podcast to discuss the important topic of alarm management applied to pipeline safety.
In this episode, you will learn about the key differences between alarm management and process safety management, the four foundational principles of alarm management, the importance of creating clear definitions within the pipeline operation to support pipeline safety, and other valuable topics.
Pipeline Alarm Management: Show Notes, Links, and Insider Terms
- Doug Rothenberg is the President and Principal Consultant of D-RoTH, Inc., a technology consulting company providing innovative technology and services for industry. Doug’s specialty is control alarm management training and consulting for the industrial process industries. Find and connect with Doug on LinkedIn.
- Get Doug Rothenberg’s book, “Situation Management for Process Control,” as discussed in the podcast here.
- Alarm Management is the application of human factors along with instrumentation engineering and systems thinking to manage the design of an alarm system to increase its usability.
- Alarm rationalization is a component of the Alarm Management process of analyzing configured alarms to determine causes and consequences so that alarm priorities can be determined to adhere to API 1167. Additionally, this information is documented and made available to the controller to improve responses to uncommon alarm conditions.
- An alarm management program is a method to manage the alarm rationalization process in a pipeline control room.
- Alarm rationalization is a component of the Alarm Management process of analyzing configured alarms to determine causes and consequences so that alarm priorities can be determined to adhere to API 1167. Additionally, this information is documented and made available to the controller to improve responses to uncommon alarm conditions.
- Process Safety Management (PSM) refers to a set of interrelated approaches to managing hazards associated with the process industries and is intended to reduce the frequency and severity of incidents resulting from releases of chemicals and other energy sources.
- PHMSA (Pipeline and Hazardous Materials Safety Administration) is responsible for providing pipeline safety oversight through regulatory rulemaking, NTSB recommendations, and other important functions to protect people and the environment through the safe transportation of energy and other hazardous materials.
- MAOP (Maximum Allowable Operating Pressure) is a pressure limit set by PHMSA that applies to compressed gas pressure vessels, pipelines, and storage tanks.
- Distributed Control System (DCS) is an automated control system typically used in the processing industry that includes controllers spread out across a plant or control area. There is typically no central operator supervisory control.
- SCADA (Supervisory Control and Data Acquisition) is a system of software and technology that allows pipeliners to control processes locally or at remote location. SCADA breaks down into two key functions: supervisory control and data acquisition. Included is managing the field, communication, and control room technology components that send and receive valuable data, allowing users to respond to the data.
- HMI (Human Machine Interface) is the user interface that connects an operator to the controller in pipeline operations. High-performance HMI is the next level of taking available data and presenting it as information that is helpful to the controller to understand the present and future activity in the pipeline.
- Pressure Safety Valves (PSVs) are specialized types of relief valves used for emergency use to relieve pressure in a valve during an upset.
Pipeline Alarm Management: Full Episode Transcript
Russel Treat: Welcome to the Pipeliners Podcast, episode 165, sponsored by the American Petroleum Institute, driving safety, environmental protection, and sustainability across the natural gas and oil industry through world-class standards and safety programs. Since its formation as a standards-setting organization in 1919, API has developed more than 700 standards to enhance industry operations worldwide. Find out more about API at api.org.
[music]
Announcer: The Pipeliners Podcast, where professionals, Bubba geeks, and industry insiders share their knowledge and experience about technology, projects, and pipeline operations. Now, your host, Russel Treat.
Russel: Thanks for listening to the Pipeliners Podcast. I appreciate you taking the time. To show that appreciation, we’re giving away a customized YETI tumbler to one listener each episode. This week, our winner is Antonio Montes with SGS. To learn how you can win this signature prize, stick around till the end of the episode.
This week, Doug Rothenberg returns for the first of a two-part series on alarm management, starting with a conversation about alarm management applied to pipeline safety. Doug, welcome back to the Pipeliners Podcast.
Doug Rothenberg: Thank you, Russel. I’m delighted to be here.
Russel: I asked you to come here to talk about something I know you’re passionate about, and that subject is alarm management. For the listeners, we’ve got the guy here who literally wrote the book on alarm management, so no telling how far and long. Doug and I will take this conversation, and they’re right, Doug?
Doug: That’s probably too right.
Russel: [laughs] It’s exactly right. Let me start with the simple question. How do you define alarm management?
Doug: This is a pretty easy definition, but it confuses all of us. Alarm management is basically trying to figure out how do you want to use alarms for operational safety in a pipeline? How do you get it to work the way you want to? It delivers the capability that you’re looking for.
The issue for a lot of us is that alarming is such an intuitive thing, that we all have an opinion about it even before we start. We all know what alarm is. It’s a loud sound. Sometimes it’s too loud. Sometimes it’s not loud enough. We all have it.
What we need to understand where we’re going to use it as an operational tool, is that we need to turn this conception that we have into a reality that is actually a tool for the controller.
Russel: That’s so very well said. The key point you’re making here is that we all have embedded in our thinking a preconceived notion about what an alarm is, and how it should work. That notion is often very much misunderstood.
Doug: It is misunderstood, but it’s understandable.
Russel: What do you think some of the common misconceptions are for people who are not alarm management folks?
Doug: The answer to this is probably just to look a little bit at the historical sequence. The early alarm management was…Let me back up a little bit further. The story of how I got into alarm management is going to be very revealing.
A hundred years ago, not really but that’s a number to say, it’s a long time ago. I was working for a major oil company. I was in their research and development in engineering. The manager of refining — and this is refining company — asked me to go into the control room and see how many control loops an operator — you call him controls but in the process of industry, it’s an operator — can manage.
Like an idiot, I took the job. After five minutes it became very clear that that didn’t have an answer. I saved myself by going back and saying, “I can’t answer that question, but I can go into the control room and find out what’s getting into the operator’s way, that if we change, it will change the whole frame of operations.”
In an hour and a half of asking that question in the control room, the alarm system popped out. We’d never even thought about it before that way. Once we did, it turns out that the alarm system, which was supposed to be a tool to help the controller or the operator, turned out to be not only not helping, but getting in his way.
That’s the way I was introduced into alarm management. Then I progressed to take that intuition, turning it into a technology. Then, thank you, [laughs] it turns into a book and a lot of other activities to improve the alarm management.
Russel: [laughs] That’s actually an interesting story. It’s one of the things you find in the control room in particular. If you go in there with an idea of what the problem is, then you talk to the controllers, it takes a little bit of time and effort. At some point, all of a sudden, you hear what they’re saying. [laughs]
That’s not a reflection on the controllers at all. It’s that we don’t understand what…We think we know what they’re doing. We don’t, oftentimes.
Doug: Yes. Oftentimes, what it takes is, it’s a conversation back and forth. Some of the problems we hear in the control room are symptoms. If we’re smart enough, we take those symptoms and turn them into root causes. Then we look for real solutions.
Russel: Before we get much further into this, I know you’ve done a lot of work in the process industries and a lot of work in the pipeline industry. I’m curious, in your experience what’s different about pipelining versus petrochemical refining new power or those kinds of things?
Doug: Good question. There’s a short answer part of that and a little bit longer answer. The short answer part about it is pipelining are very lucky about alarm management. PHMSA has basically said, “Hey guys, you need to do it right.” We’ve passed the management hurdle. We’re now into the engineering and implementation hurdle.
That’s a good place to be if you’re an engineer, and you want to implement. The first difference is, you need to do it. The second difference is that there’s not much of a difference. Almost every principle, almost every technique, almost every implementation is going to be very similar.
Perhaps the biggest difference is that most process industries are implemented on what we would call a Distributed Control System, or DCS. Most pipelining are into a SCADA system, or another type of system. They are basically very similar in technology.
Russel: I’m going to put this out there and see what you think about it. There’s a couple of differences. They’re subtle. I agree with you that the techniques and the processes and the philosophies, they all work in pretty much any case where you need to have an alarm system.
What I find in pipelining is a couple of things that make it a bit unique versus what limited experience I have elsewhere. One is time. The time it takes to get the information back to the control room and the time it takes to get someone to the field to do something about it is quite a bit different.
If you think about a petrochemical facility, it comes back virtually instantaneously if I actually have to send somebody out. I’ve got an operator who’s there at the facility that has to walk or get in a car and go there, and that’s a matter of minutes. In pipelining that can be matters of an hour or more. That’s one fundamental difference.
The other fundamental difference is that so many of the pipeline facilities are designed to mechanically become safe, if there’s an upset. It gets hard to rationalize sometimes because, it’s going to hit the safety event, that kind of thing. Those two things are the primary differences. Would you agree with me, or am I missing it?
Doug: We’re going into an interesting corner of the discussion. It’s a good one. The distances involved are…Let’s turn them into alarm numbers. Distance is time. The distances are very dramatically different, not only to get a long arm out there to do it, but also sometimes to get the data back by telemetry.
All that does is put a little bit different information into the alarm response directions for the controller, and also to place a different value on the amount of time that you can allow to the controller to solve the problem. That determines what the alarm activation point is. We’ll get into that later.
The other aspect, in terms of automation, is a good thing for some pipeline operations who have it. Most process industries have similar kinds of safeguards. The reason for the alarm system is that when these safeguards activate, they’re usually not very pretty.
What you’d like to do is if you can put a human being in the solution path before they are challenged, then maybe you can change something that’s not so pretty into something that’s maybe embarrassing or inconvenient, but it is still a vastly different place that you’d like to be.
Russel: Changing the topic a little bit. I know one of the things you like to say, Doug, is that an alarm system well-designed and implemented is a cornerstone of safe operations. What do you mean by that?
Doug: An alarm system allows an enterprise to basically implement what they’d like to have in their pipeline operation in order to be safe.
It allows you to take words like, “we want to operate safely, we don’t want to have this, we don’t want to have that kind of an incident, we don’t want to have an outage, we don’t want to have an overpressure,” and allows you to translate that into operational instructions for the controller to help manage those situations before they escalate.
What we do is we try to identify all those abnormal situations and plan for the controller’s intervention in a way that takes the collective knowledge of the whole system and presents it to the controller when he needs it in the form he can use.
Russel: [laughs] That is easy to say. When and in a form they can use, that is a mouthful. When you talk about what does it take to get there — the interesting thing I think about control rooms is on the surface, the ones that are doing a good job, they look calm, they look organized. It looks like the pace of the work is not that intense. Things are pretty simple.
That masks all the complexity in the process, and it masks all the complexity in the systems to get to that level of simplicity so a human can operate it effectively.
Doug: Yes. I’d like to change one word. I don’t think it masks. It demonstrates the competency of the organization.
Russel: That’s better said. That’s a better way to say it.
Doug: I know where you were going…
Russel: I’m coming from a standpoint of an untrained eye.
Doug: [laughs] You’re far from untrained.
Russel: I grew up in measurement. It’s easy for me to reflect when I first went into control rooms and I thought I knew what I was looking at. Here I am, some decades later, and I’m very aware of all the things I don’t know. [laughs]
Doug: As we all are.
Russel: As I’ve become more educated, what happens is the domain of those things I know that I don’t know has expanded.
Doug: Scary. [laughs]
Russel: Here’s another good fundamental question. What is an alarm?
Doug: What is alarm? An alarm basically is an enterprise’s last chance to put the controller in the solution path for an abnormal situation, something’s gone wrong, before those other hard stop aspects take over.
If they’re not there, or not done well, before the incident happens and causes supply disruptions, communities, fires, whatever they are. What the alarm is, is the last chance to put the controller in that solution path.
Russel: The purpose of an alarm is to allow a controller to handle abnormal successfully.
Doug: Correct.
Russel: To not miss anything.
Doug: Hopefully.
Russel: That’s the purpose. It’s not always the outcome. That’s the purpose. One of the things that I remember very well, Doug, when I was first learning alarm management, this was probably back in the 2007/2008 timeframe.
I was working with you and some of our customers, and we were doing alarm management workshops. I remember, the learning curve was steep. For me, experientially, it was different than other learning I’ve done. I remember very well the moment I got it. It was the third or fourth time I was going through one of your trainings with you and trying to help facilitate.
The day I got it was like, “Oh.” [laughs] It’s like getting hit upside the head with a shovel that you didn’t see coming. I thought I had it, but I didn’t, and then I got it. That kind of thing.
Doug: Yes. It’s a lucky event. I’m glad.
Russel: The big aha is that the whole purpose of alarming is to equip somebody to handle abnormal, and to remove the ambiguity that can often show up if there’s not a thought out process.
Doug: Correct.
Russel: The alarm itself and where it enunciates, it’s not as much about that as it is about putting those things into the controller’s hands that they need to be able to respond effectively to the alarm. That’s what it’s about.
Doug: The amazing part about that is that it allows the enterprise to capture the collective knowledge of everybody who’s an interested party.
I’ll tell you a quick, little vignette. During one of the workshops that we did together, there was an expert in compressors that had flown in — it was a pretty large company — flown in from the west coast to help out. He was going to be there for a day, help us out, and then go home. He ended up staying the entire week. Just before he left, he said, “Doug, I expected to come in, just show off a little bit, and then go home. I can tell you, I’m a compressor expert. I learned things in this workshop that I didn’t even know before about compressors.”
Collective knowledge is really valuable.
Russel: I’ve done a couple of podcasts where I talked about safety management and alarm management. How they’re the same and how they’re different. I think one of the challenges of doing alarm management well is getting the group together and creating that conversation around the alarms. It’s challenging to do that in any organization.
We always manage to do it around process safety management, particularly where it’s required. What’s your take on how much alike and how much different is process safety management versus alarm management?
Doug: They’re like brother and sister. I gotta tell you that when I’m in these sessions, it’s like magic. It’s amazing how process safety dovetails with alarm management, and alarm management feeds back and tells the process safety guys, “Hey, when you said this is important, here’s how we do it.” It’s a deep conversation.
Russel: I find that a lot of the process safety management is about making sure the system will mechanically go safe in the event of abnormal, and alarm management is about making sure you never hit the safety.
Doug: Well said.
Russel: The interesting thing about that is the context-building, which is the hard work of doing this process. The context-building of what is the process, how does it work, how is it automated, etc, is the same whether you’re doing alarm management or process safety management. But, the question you’re answering is slightly different, and in there is all the value.
One is asking, how do we make sure that if we have an upset, the system will mechanically go safe? The other is asking it, how are we going to operate this in a way that we never hit the safety?
Doug: Exactly.
Russel: Cool. I’m so honored that you agree with me on that.
Doug: [laughs] Russel, there’s very few things we don’t disagree on, sorry.
Russel: There’s lots we could argue about, however.
Doug: Thank goodness.
Russel: What are the foundational principles of alarm management?
Doug: Foundational principles do something for alarm management that we needed. What is done is that understanding the foundation takes alarm management from an art form into a technology. By technology, it means it’s something that we can do, understand, and count on.
Every time we make a decision, we can go back and see why the decision was made, what we were trying to do for that, and what the outcome is. Alarm management has four foundational principles. Before I list them, I’m going to say the following.
Every candidate for an alarm has to honor these four principles. If any one of them are not honored, you don’t have an alarm. You can’t have an alarm. Let’s say what they are. First of all, if you want to configure an alarm, every alarm requires timely, specific, controller action. That action must be a necessary action, not just where it’s not an action.
Number two, every alarm activation must occur with enough time for the controller to actually do the things you want him to do to prevent the alarm from escalating the abnormal situation from progressing to the point of challenging the enterprise safety, or environmental safety.
Third. No fair requiring a controller action, and giving him enough time to do the work without telling him what he has to do to do it successfully. So, the third part is providing enough information.
The fourth fundamental is you could be true to all three of these first ones, but you can alarm trivial things, or things that are not so important. Because an alarm is so important, you’re going to take the controller away from everything else he’s doing to work the alarm. The fourth principle says that the alarm activation must be important enough to do just that.
No individual alarm can be configured if it doesn’t comport to all four. None. And, any time you have a very difficult alarm management problem to answer, you can go back to these four fundamentals and say, “Is my problem covered by the four fundamentals?” Almost without exception, you’ll find the answer is there.
Russel: What I think happens frequently is that answering those four questions very deliberately for every alarm is time-consuming, and requires multiple people to be involved. People will get hung up on how hard that work is. Conceptually thinking about it, they’ll get hung up on how hard that work is.
Having said that, the other thing that happens is when you have some good leadership around the alarm management program, and they’re doing this, everybody appreciates the outcome.
Doug: Everybody.
Russel: Yes.
Doug: The short answer is, yes, it requires a lot of effort and activity. Why would you ask somebody to be in a control room and handle problems that come up without giving them enough time, without giving him the help that he needs to have? Controllers are not innovators. They’re not inventors. They can’t invent on the fly.
If we give them the collective knowledge of the enterprise, even if it takes a little time, that’s much, much more effective than cleaning up afterwards.
Russel: Sure. It also impacts workload and fatigue. The greater the ambiguity, the less the time I have to respond, the higher my stress. Stress goes directly to fatigue. If I can roll these things down the scale in terms of the ambiguity, and move from ambiguity towards clarity, and move from not enough time to adequate time, then my stress is going to go down.
There’s always going to be a challenge in any control room getting that where you want it to be, because it’s just a challenge.
I want to unpack each of these a little bit. The first one you said is it requires timely, specific, and necessary action.
When I’m talking about this, I always add, or something bad’s going to happen. If I don’t do this timely specific and necessary action, I’m going to end up in a bad outcome. That’s another way of saying necessary.
Doug: That’s correct. There’s one little nuance, and it’s an important one, is that there are a lot of abnormal situations that you once you understand them, you may not need to act. If a situation is normally such that you need to do something, but on closer look, you don’t need to, that’s already an advantage.
What you’ve done is you’ve not had the controller do something that would cause problems. Now he knows he doesn’t have to do it. Both sides of that coin are in the timely action.
Russel: So, the deliberately not doing is the same as deliberately doing. Is that what you’re saying?
Doug: Yes, because they’re both informed decisions.
Russel: Dang it. Doug, I learn something new every single time I talk to you.
[laughter]
Doug: We all do.
Russel: I will say, my head’s not hurting nearly as bad as the first time we had this conversation.
Doug: Bruises heal. Thank goodness.
Russel: The other thing you say is every activation must occur with enough time to correct. Then again, I would say this a little differently, is I’ve got to have enough time to take action so that I can avoid the negative outcome.
Doug: This is probably one of the most powerful parts about alarm management. Again, a little story might be helpful.
I had a client, I won’t tell you what part of the country he is, because you’ll identify him. They were operating too close to MAOP. They had an incident that actually cost him a lot of money and a lot of pride. Once we said, “Okay, if you want to prevent this incident, where would you tell the controller he’s got a problem in enough time to be able to change it?”
They calculated a number below MAOP, and they said, “Okay, if you operate any part above that, you’re going to be in the same situation you were before. What is your plan?” “Well, we have to operate at the lower value, because that’s where we ensure safety.” It is a game-changer when you understand it.
The other nuance, I want to add very quickly, if you cannot find enough time. If your normal operating point is so close to the trouble point that you can never have enough time, that means you can never put an alarm as a protection. You can’t put the controller there. You have to have some other mechanism.
Russel: That’s what I see a lot, is you look at where we operate a facility versus where the PSVs, the safeties, and the reliefs are set, is there’s not enough margin there to make a corrective action in the time necessary.
If you need to operate the facility at that level, you need to actually figure out how to move the process safety up, or move the operating pressure down to give that margin to have time to respond. That conversation has to go to the very highest levels of the organization.
Doug: Once you accept alarm management principles, then the conversation has a basis.
Russel: Absolutely. I love this one. No fair requiring action without providing information and instruction.
Doug: PHMSA said it nicely. [laughs]
Russel: If the action is your pressure’s too high, and the alarm rationalization says reduce the pressure, that’s not really helpful. [laughs]
Doug: Correct.
Russel: Oh, my gosh. I have seen that. I will say, I have seen that.
Doug: Before we leave the topic, the fourth principle is going to be helpful. A lot of situations where we think about we need an alarm. I’ve been in groups, “Oh, we absolutely need that alarm. We really, really need it.”
I said, “What’s going to happen if the controller doesn’t see the situation and doesn’t manage it?” “Well, not too much, but we really need it.” I said, “Okay, go back to your risk management. If your risk management principles say you need it, then you need it. If it doesn’t, you don’t.”
Russel: The other thing, too, and this is a segue to your most recent book. The way I say this, if you look at the control room system, in terms of the real-time information I’m being provided, I have the HMI and I have the alarms. Ideally, the HMI is going to tell me I’m headed towards abnormal, and the alarm’s going to tell me I’m running out of time.
That’s not necessarily that easy to accomplish. Ideally, that’s what’s going to happen. The big challenge is always figuring out, the controllers always want all the information and loads of trends. There’s valid reasons they need all that.
When I’m talking about how to operate safely, the challenge there is I got to get it down to those few things that matter to me.
Doug: Yes, you do.
Russel: That’s the hard valuable work of this process.
Doug: It is.
Russel: It is. Doug, we’re coming to the end of talking about alarm management. We’re going to do another episode for the listeners that are interested. We’re going to talk about rationalization and how to do rationalization. That’s a subject I know everybody at pipelining loves. [laughs]
I’m being a little bit flippant. It’s an interesting and challenging process, or it can be. In terms of alarm management, is there anything you’d want the listeners to take away as the one or two things you want them to remember about this conversation?
Russel: First of all, I’m glad you’re listening. A proper alarm system is one of those very easy, understandable, safe operation activities, that every hour you put into it, you gain much more than an hour of safety out of it.
Doug: We know how to do it. It incorporates all of the values of your enterprise and respects your controller so that when he walks into the control room, he can feel that he’s capable of doing that job every day. I’ve seen control rooms before and after rationalization. The controller can feel the difference.
Russel: I would 100 percent support that. That’s very well said. I’ve seen controllers…One of the things about doing rationalization is at the end of the process, everybody that was involved understands that operation probably in ways they never did before. That’s a nice tee-up for our next conversation.
Doug, thanks for being on. I look forward to having the next conversation.
Doug: Thanks for inviting me, Russel. It was a pleasure.
Russel: I hope you enjoyed this week’s episode of the Pipeliners Podcast from our conversation with Doug Rothenberg. Just a reminder before you go, you should register to win our customized Pipeliners Podcast YETI tumbler. Simply visit pipelinepodcastnetwork.com/win to enter yourself in the drawing.
If you’d like to support the podcast, please leave us a review on Apple Podcast, Google Play, or on your smart device podcast app. You could find instructions at pipelinepodcastnetwork.com.
[music]
Russel: If you have ideas, questions, or topics you’d be interested in, please let me know on the Contact Us page at pipelinepodcastnetwork.com or reach out to me on LinkedIn. Thanks for listening. I’ll talk to you next week.
Transcription by CastingWords