Pipeliners Podcast host Russel Treat is joined by control room expert Ross Adams of EnerSys Corporation to discuss two important facets of control room management: alarm management and emergency response.
In this episode, you will learn about the importance of advancing from alarm rationalization to Alarm Management, establishing clear definitions in the control room to ensure alignment among all personnel, having a plan to mitigate risk in each operating condition, and more.
We are also seeking your feedback on this episode. In advance of Russel Treat presenting a whitepaper on this topic at the API Pipeline Conference & Control Room Forum, we are seeking industry input on alarm management and emergency response. After listening to the episode, please contact the show or reach out to Russel Treat on LinkedIn with your input.
Alarm Management and Emergency Response: Show Notes, Links, and Insider Terms
- Ross Adams is an Accounts Service Specialist at EnerSys. Find and connect with Ross on LinkedIn.
- The 2019 API Pipeline Conference & Control Room Forum takes place April 9-11 in Phoenix.
- Russel Treat is scheduled to present a whitepaper on April 11 at 8:00 a.m. to kick off the Control Room Forum. The whitepaper is titled: “How to Optimize Alarm Management in the Control Room to Support Emergency Response.”
- Alarm Management is the process of managing the alarming system in a pipeline operation by documenting the alarm rationalization process, assisting controller alarm response, and generating alarm reports that comply with the CRM Rule for control room management.
- Alarm Rationalization is a component of the Alarm Management process of analyzing configured alarms to determine causes and consequences so that alarm priorities can be determined to adhere to API 1167. Additionally, this information is documented and made available to the controller to improve responses to uncommon alarm conditions.
- An Alarm Response Sheet indicates to a pipeline controller what the alarm is, what could cause the alarm, how to verify the cause, and what actions should be taken given the cause.
- Alarm Rationalization is a component of the Alarm Management process of analyzing configured alarms to determine causes and consequences so that alarm priorities can be determined to adhere to API 1167. Additionally, this information is documented and made available to the controller to improve responses to uncommon alarm conditions.
- The CRM Rule (Control Room Management Rule as defined by 49 CFR Parts 192 and 195) introduced by PHMSA provides regulations and guidelines for control room managers to safely operate a pipeline. PHMSA’s pipeline safety regulations prescribe safety requirements for controllers, control rooms, and SCADA systems used to remotely monitor and control pipeline operations.
- Roles & Responsibilities was clarified in the 2018 FAQs update as operators needing to define which individuals have the ability to influence or direct the actions of a controller.
- The Team Training requirement was clarified in the 2018 FAQs that pertinent pipeline personnel should be trained on critical areas of control room support by January 2019.
- SCADA (Supervisory Control and Data Acquisition) is a system of software and technology that allows pipeliners to control processes locally or at remote location. SCADA breaks down into two key functions: supervisory control and data acquisition. Included is managing the field, communication, and control room technology components that send and receive valuable data, allowing users to respond to the data.
- HMI (Human Machine Interface) is the user interface that connects an operator to the controller in pipeline operations. High-performance HMI is the next level of taking available data and presenting it as information that is helpful to the controller to understand the present and future activity in the pipeline.
- NTSB (National Transportation Safety Board) is a U.S. government agency responsible for the safe transportation through Aviation, Highway, Marine, Railroad, and Pipeline. The entity investigates incidents and accidents involving transportation and also makes recommendations for safety improvements.
- AOC (Abnormal Operating Condition) is defined by the CRM Rule (49 CFR Subpart 195.503) as a condition identified by a pipeline operator that may indicate a malfunction of a component or deviation from normal operations that may indicate a condition exceeding design limits or result in a hazard(s) to persons, property, or the environment.
- HCA (High-Consequence Areas) are defined by PHMSA as a potential impact zone that contains 20 or more structures intended for human occupancy or an identified site. PHMSA identifies how pipeline operators must identify, prioritize, assess, evaluate, repair, and validate the integrity of gas transmission pipelines that could, in the event of a leak or failure, affect HCAs.
- GIS (Geographic Information System) is a method of capturing the earth’s geographical profile to produce maps, capture data, and analyze geographical shifts that occur over time.
Alarm Management and Emergency Response: Full Episode Transcript
Russel Treat: Welcome to the Pipeliners Podcast, episode 63, sponsored by EnerSys Corporation, providers of POEMS, the Pipeline Operations Excellence Management System, compliance, and operation software for the pipeline control center.
Find out more about POEMS at enersyscorp.com/podcast.
[music]
Announcer: The Pipeliners Podcast, where professionals, Bubba geeks, and industry insiders share their knowledge and experience about technology, projects, and pipeline operations.
And now your host, Russel Treat.
Russel: Thanks for listening. We appreciate you taking the time. To show that appreciation, we are giving away a customized YETI tumbler to one listener each episode.
This week, our winner is Michelle Loveless with Par Hawaii. Not only does Michelle get to live in Hawaii, she also gets a cool YETI tumbler. To learn how you can win this signature prize pack, stick around until the end of the episode.
Ross, welcome back to the Pipeliners Podcast.
Ross Adams: Thanks for having me. I appreciate it.
Russel: You might want to know that I wrangled Ross into this at the last minute, so he didn’t have a whole lot of time to prepare. I’m sure he will appreciate me making that kind of declaration here at the beginning of this episode. I’ve asked Ross to come on and talk.
I’ve been asked to do a paper at the API Pipeline Conference on alarm management and emergency response. Need to brainstorm some things about what the content of that paper might be. I’ve asked Ross to come on and maybe help me shake some ideas out of this head.
Likewise, at the end of this, I’m going to be asking for feedback from the listeners. Did we get it right or are we missing things? As we start sitting down to write this whitepaper, we’re addressing the things that industry would be interested in. Ross, again, thanks for coming on and agreeing to do this at the last minute.
Ross: Oh, it’s my pleasure. I really like the model and the set-up for how we’re approaching doing this whitepaper. I think a lot of times, when people come to these conferences, they have a need. They have questions that they’d like to see answered.
Often, the people that host these conferences do a great job trying to think about what industry needs information on. You’re taking it one step further and actually going to industry and asking the question through the Pipeliners Podcast.
I’m excited to be a part of it. Hopefully, we’ll learn some things together today.
Russel: That’s certainly the intent. We’ll see how it goes.
Diving right in, how does alarm management and emergency response fit within the control room, in general, in your experience?
Ross: Alarm management is certainly what the control room is going to spend more of its time focusing on.
The regulations for the control room have a lot to say about the monthly alarm reviews that have to be done, and the annual alarm reviews, and how you perform rationalization of alarms in advance of a controller actually getting the points into the HMI, and having that functionality as part of their domain of responsibility.
Not only is the controller thinking a lot about alarms, and so is the control room manager, and others. A good alarm management program removes a lot of white noise out of a control room, and helps a controller to really function at peak efficiency and really have a great awareness and understanding for what’s going on, on the pipeline.
In some instances where there’s either poor alarm management or even where there’s good alarm management but instances occur in the field, how well that alarm program is designed is going to correlate to how quickly and how effectively a control room is able to respond to an emergency.
We know through a lot of the data that PHMSA has published and just from our experience in industry that the control room is rarely ever the cause of an incident. But they’re most often the first to detect or at least have the first opportunity to mitigate the incident from becoming something worse than what it could be.
Russel: Yeah, I think that’s really a good tee up there.
That’s exactly right. If you go back and you look at the NTSB pipeline accident investigations, you really don’t see the control room as a cause. But you very often see the control room as part of the mitigation. The question always becomes, could you mitigate more quickly and effectively?
It’s also important to talk a little bit about what is an alarm. I’ll do a little bit of definition here. When we’re doing alarm management, or when I’m doing alarm management, we define alarm as requires controller action or something bad is going to happen in a time frame.
One of the challenges around alarm management in general is just this word alarm has been used in our business for a long time to mean a lot of different things. When you start doing alarm management, you’re actually honing in a little bit.
When you think about it, alarm is something that’s coming to me through the SCADA system — through the human machine interface. It’s telling me that there’s something that I need to deal with, and if I don’t deal with it, I’m going to have a problem.
Emergency response is, I have a problem, and I need to respond to the problem.
It might be a little overly simplistic. But I think it’s also realistic to say that alarms, when they’re not effectively managed, would escalate to emergencies. Does that sound correct to you?
Ross: Yeah, I think that’s right. PHMSA has a lot of focus on abnormal operating conditions and the training of controllers around these AOCs, to the point that they expect operators would have a list of AOCs, and especially those that could occur simultaneously or in sequence.
I think one of the reasons that they do that is because not only are you going to be having a lot more alarms in your alarm feed during that period of time, but the controller is starting to lose their situational awareness, and it can always elevate. Or it could potentially elevate, rather, to an emergency situation.
So, I think that’s exactly right.
Russel: If you think about alarm management and how it’s related to emergency response…again, anybody who works in a control room knows that what I’m about to say is overly simplistic, but I think it’s a good idea as a simple model to understand things this way.
My goal is to always have everything normal. Sometimes things happen that are outside of my control, and things become abnormal. I lose communications. I have loss of instruments. I might be operating at pressures beyond what I’m comfortable operating at. It might still be within the safe range, but I’m up at the upper edge, that sort of thing.
All of those things could be abnormal. Then, abnormal can become emergency.
It doesn’t always work that way. You don’t always go from normal, through abnormal, to emergency, because sometimes emergencies are related to third-party damage. Sometimes they’re related to things like an earthquake or whatever. But again, it’s not a bad model to think about.
The question becomes, what can I do in my alarm management program that might help me be better prepared when an emergency does occur to operate or mitigate that alarm more quickly?
Ross: I think it’s a great question. I think the answer is going vary, depending upon the operating philosophy of the particular operators.
But what it comes down to is if your controller, as part of your role and responsibilities, as you’ve established them in your control room management plan, if your controller is the person who is going to be responding initially to this emergency operating condition, you need your controller to have the information that’s necessary to effectively respond.
I think there’s information that can be captured as part of your alarm rationalization process that won’t take that long to capture on the front end.
Russel: When minutes matter.
Ross: That’s a great way to put it. When minutes matter, it communicates very clearly what the controller needs in terms of procedures, or in terms of contact information, so that there’s no delay. They can leap right into action, and hopefully do a lot to prevent damage to property or the environment or to people.
Russel: I’m certainly no expert in emergency response. But certainly working in control rooms, you get some exposure to it. And working around pipelines, you get some exposure to it.
My experience would be, often times the first person to become aware of the emergencies is the control room. That’s the first person to become aware.
They then have a process they follow that would initiate an emergency response or initiate an incident response process. Then they have some other duties.
Generally, in my experience, someone in the field becomes the incident commander as soon as somebody in the field is able to get on site with the incident. There’s this period of time between initial discovery and incident command in place that the control room has a pretty critical task.
To the extent that the task can be operated more quickly, more effectively, the incident response is going to be more effective overall. So with that tee up, it would probably be good to talk about what some of those things are that a control room is going to do.
Probably the first thing they’re going to do is isolate the incident, and make the system safe around the incident. That can be fairly simple, if I’m talking about a straight-run pipe and I know I have an incident somewhere between two valves.
It can get quite complex if I’m a utility. Understanding how to isolate an incident without taking other critical infrastructure offline can be challenging.
Ross: Even beyond alarm rationalization, is there any other technology that would make a controller’s ability to identify the emergency easier?
Russel: That’s a great question. Certainly, one of the things that’s often used, particularly in utilities, is a newsfeed. A lot of times, news cameras will give you information and line of sight to an issue quicker than you can get it from your own people.
That’s one avenue of gathering additional information. Another thing that can happen is, probably if I have an emergency condition occur, I’m going to have some alarms in the control room.
I’m going to have some low-pressure alarms. I may have other kinds of alarms occur in the control room, if I have some kind of incident on the pipe.
Ross: Something like leak detection.
Russel: Exactly, good point. I might have leak detection alarms. Leak detection alarms might come with a location of the leak. All of that is information that would help me understand where is the incident and what do I need to do to isolate.
I might also need to look up maps. I might need to know, is this occurring in an HCA? I might need to know, what do I have available to me in terms of valving, either automated or manual that I’ve got to access and shut down to isolate that. Certainly, access to GIS or mapping systems can be valuable.
I’ve actually seen in utilities where they’ll look at weather, wind direction. They might have a separate leak report, like people are calling in and, “I smell hydrocarbons,” room. Gas companies, utilities would have that. Getting some kind of line of sight to, am I getting increased call volume or something like that could be helpful as well.
Ross: I’ve seen in control rooms where tag names are designed in such a way that it gives away to understanding where the point is geographical. You mentioned the GIS. Have you seen GIS tied in as part of alarm rationalization?
Russel: I haven’t seen GIS tied in as alarm rationalization, per se. But I have seen GIS tied into location, particularly like when you start talking about remote sites like main line valves, it’s relatively common that there is a GIS coordinate for that mainline valve site. And that GIS coordinate for the mainline valve site is in some kind of geographic information or mapping system.
Ross: Okay. At the end of the day, it’s all more information for the controller.
Russel: Right. There’s ways to tie that together. Sometimes as simple as putting a URL into the alarm rationalization that would allow them to bring up the map that is the area around where that alarm is. That could be something that would be valuable, I would think.
Again, you made the point that every operator is different. Gas different than liquids. Utility different than transportation. The issues and the things you might want to see might be different.
Ross: Absolutely. Access to resources and different things varies as well. Some of these solutions may be more reasonable for larger operators. Whereas a smaller operator is relying on other solutions.
Russel: That’s true. I’m trying to think in my mind’s eye, what’s the response in the control room?
First thing is going to be initiate incident command. Second thing is going to be to isolate the incident. Third thing is going to be communications. That’s probably one of the bigger roles of the control room in a situation like this.
That is initiating the communications. Who are the people that would need to be notified? Certainly, there’s first responders that might need to be notified. If this is happening at a site where you have interconnect parties, you might need to notify the interconnect party.
If the location is impacting critical users, like I’ve got a power plant that’s coming off a transmission line, and I’m going to have to isolate this line, I might impact the fuel feed for a power plant.
These are all things that would be required communications activities that might or might not be part of what’s going on with the incident command structure.
That’s one of the reasons in the PHSMA rules, you’ve got to define your roles and responsibilities. You’ve got to do it during normal, abnormal, and emergency. That’s because the roles and responsibilities change in those situations.
Ross: Absolutely. Not to go down the rabbit hole in regards to team training, but the team training requirement expects that you would cover all three of those different operating conditions as well, given that each of them are unique and roles and responsibilities can change. That requires different things of different people.
We’ve talked a lot about different technologies, but if we were to bundle that up or package it or implement that in a simple way as a part of alarm rationalization, is there a good first step or a solution that people should be looking at?
Russel: That’s a great question. You’re tripping me up, Ross. Well played.
Ross: It’s nice to be on this end of things, for once.
Russel: Exactly. The thing that comes to mind, I tend to think more of what would be the ideal thing to have. The ideal thing to have would be a checklist that was specific to that emergency. That would be the ideal thing.
This where maybe this idea of alarm management and emergency response and tying them together…because one of the things you do in alarm management if you’re doing rationalization is you’re defining, what are the things that could cause this alarm?
What are the things I do to verify that the alarm is valid and make sure I understand which cause is causing it? Then, what are the actions I take to manage the alarm?
If I took that a step further and said, this alarm is actually being created by an emergency, if I could write down, here’s the things you do when you have an emergency related to this alarm.
A lot of times what will happen in an emergency situation is the alarms come in after the incident. Having that additional information, like, if this alarm is related to an emergency in progress, here’s who you contact as the first responder. Here are the upstream and downstream assets that might be impacted. Here’s how you isolate this point on the system.
That kind of information could be really helpful to have at your fingertips. If you added to that maps, and particularly if those maps had HCAs drawn out on them so that you know that an alarm at this site is within an HCA or not, high consequence area, that also could be very resourceful.
Rather than the controller having to look those things up or remember it, they could have it at their fingertips. They’re going to be more effective.
Ross: That’s sounds right. It sounds like if operators leaned into more robust alarm response sheets, that’s what we’re talking about here.
Another thing, too, and I’ll pose it as a question, but another element of the alarm rationalization is the assignment and the calculation around alarm priorities. And the attempt to make sure that your white noise, your alarm feed, are effective.
I know a lot of people are balancing the definition of safety-related alarms in the context of their operations, as well as alarms versus alerts.
What’s a good way of thinking about the rationalization process around priority to make sure that your controllers cover seeing what they need to see in terms of alarms, and not seeing the extra stuff, so that they can effectively respond to the emergency when there’s all this other stimulus in the room?
Russel: Again, really good question. You’ve already partially answered it. I’ll unpack it a little bit. That’s the difference between alerts and alarms.
In our vocabulary when we’re talking about this stuff, what we say is anything that comes out of the SCADA system that’s trying to tell you something, we call that a notification.
Alerts are notifications that do not require action. They’re just things that you want to know about. This valve’s set point was changed. That’s an alert. That type of thing. Okay, I know, but I don’t expect to do anything about it.
Alarms are things are where an action is required. When we do rationalization we take a safety management approach, where we’re actually scoring severity. Severity is related to the severity of the adverse consequence if the alarm is not managed.
This gets difficult, because you have to actually set some boundaries around this. Basically, we look at the alarm and one level past. If both those fail, then what’s the consequence?
What that allows us to do is establish critical, high, medium, and low severity. Critical are those things which have the more immediate and more severe adverse outcomes that are potential if you have that alarm come in.
One of the things you do is you get alarms down to only those things that require controller action. Then the other thing you do is you prioritize the alarm view so that the unacknowledged highest severity are presented first.
I want to see all my critical alarms first before I see any of my high, mediums, or lows. I want to see all my criticals that are unacknowledged before I see any other unacknowledged. By predefining that, what you can do is prioritize on the fly the controller workload.
The thing that’s interesting about this when you think about alarms is I cannot control alarm activations. What’s going on with the pipeline controls alarm activations. They’re just going to happen when the pipeline process gets to the point. They happen, they just happen. So, I can’t control that.
What I can control is my alarm configurations. Likewise, I can’t control necessarily what emergencies occur. Oftentimes those are related to things outside of the control of the control room.
But what I can do is I can predefine what I do if get these kinds of emergencies. In other words, I can plan. I can plan my alarm response before I get the alarm. I can plan my emergency response before I get the emergency.
Then I can go back and I can do analysis and say, well if I get this alarm and it goes all the way to the adverse consequence, do I need to have something associated with this alarm that would tell me what to do to respond to the emergency? Would it be that severe?
Ross: That makes a lot of sense. I’m hearing robust alarm response sheets. I’m hearing an intentional approach to alarm rationalization, especially as it relates to configuring my priorities.
I think, just from the work that I do, the third element to this is going to be your alarm program maintenance, especially as it relates to your monthly alarm reviews and your annual reviews.
With the folks that I work with, I know that each month we review, as required by PHMSA, what alarms have been taken off scan or inhibited or generated false alarms. An attempt to make sure we document what has been taken out of the SCADA system, so that we do the maintenance and put it back in an effective manner, and that things don’t get lost.
The other thing that we do, and more pertinent to this conversation, is we look at our alarm activation KPI reports. In doing that, we’re able to take a look at our top activators in a bunch of different categories, our bad actors if you will.
We’re able to determine if those alarms were caused by real events, or by activity in the field, or if further rationalization needs to occur. In some cases, there needs to be some conditional alarming that’s put into play.
But doing that, make sure that we have eyes on the alarms that are creating white noise, and fixing those, versus alarms that the controller actually needs to hear. We’re constantly in this process of cleaning up the alarm rationalization, and making sure the alarm feed is clean.
All three of these elements that we’ve talked about, I think, point to providing adequate information to the controller, so that hopefully they can mitigate an abnormal or an emergency outbreak condition.
But at the end of the day, if an emergency is to occur, they can respond and mitigate the negative consequences.
Russel: I’m thinking of something here, too, that’s a big subject in our industry right now. And that’s the idea of where can I get good lessons learned to support training? There’s also obviously the conservation around team training, which we’ve done a podcast on that recently as well.
One of the things that’s coming up as I’m thinking about this, is if I’m looking at my tabletops, I’m looking at my exercises and drills, the things that I’m doing to support team training, the things I’m doing to support other tabletop exercises and such.
I’m actually thinking that it makes sense to incorporate these alarm response sheets into that training, as a way to look for lessons learned or look for opportunities to improve my alarm rationalization, and maybe even improve my emergency response.
What do you think about that?
Ross: I think it’s a great idea. My first reaction is, that sounds difficult. As I put on my control room manager shoes, I think, gosh, one more thing to do.
Russel: No, that’s a really good point.
Ross: But, what I’ll say is, we’ve seen this in the way that we approach alarm rationalization. You can draw a circle around a lot of things that are in the field: similar assets and similar unit types.
I think that ultimately a lot of those alarm response sheets, if it’s similar type equipment, will be similar. So, it’s probably a lot easier than it sounds at first glance.
The value that comes from that, it gives you some structure, for one, with which to train your controllers. Which I know a lot of control managers are looking for things that add structure and simplify the approach to making sure that these compliance activities or these best practice activities are performed.
But, also, it’s something that the controllers can look at. When it’s two in the morning, not much is going on. They can review them and ultimately have confidence that they know what they need to know to do their job well.
Russel: Yeah. I think that’s well said.
Look, I think we’ve exhausted our knowledge in thinking about this, at least so far. What I’d like to do is put a shout-out or a challenge if you will to the listeners to respond and tell us what we didn’t cover that we should have covered. Or what, if anything, we talked about as an idea is complete nonsense and shouldn’t even be contemplated.
Ross: There’s none of that, none of that.
Russel: Let’s hope not.
But in any case, we certainly want the feedback, because in terms of putting this paper together that we’re doing for the API Pipeline Conference, we’d like for it to be accurate and really reflect a broader perspective than just our own.
Ross: That’s right. I look forward to hearing what folks have to say. I look forward to hearing you present.
What day is the presentation? Do you know?
Russel: It’s the first day. The API Pipeline Conference and the API Control Room Forum are back-to-back. They go Tuesday, Wednesday, Thursday. Ours is on the first thing, eight o’clock in the morning on April the 11th, with the start of the Control Room Forum.
Anyways, that’s the details. We’ll put more information about that in the show notes, and link it up, so that anybody that’s interested in finding out more can get that on the calendar. If you plan on coming and taking what I said and cat calling me from the audience, well, that’s okay, too.
Ross: Yeah, we’ll be celebrities. It’ll be a good thing.
Russel: Exactly.
Ross: I look forward to it.
Russel: All right. Look, Ross, thanks so much for coming in and doing this at the last minute. I really appreciate it. Good job, as always.
Ross: Thank you so much. My pleasure. Any time.
Russel: I hope you enjoyed this week’s episode of the Pipeliners Podcast, and our conversation with Ross Adams.
Just a reminder, before you go, you should register to win our customized Pipeliners Podcast YETI tumbler. Simply visit pipelinepodcastnetwork.com/win to enter yourself in the drawing.
If you’d like to support the podcast, you can leave a review on Apple Podcast, Google Play, or whatever smart device podcast application you happen to use. You can find instructions at pipelinepodcastnetwork.com.
Finally, if you have ideas, questions, or topics, or if you’d like to offer some feedback on this week’s episode, please let us know on the Contact Us page at pipelinepodcastnetwork.com or reach out to me on LinkedIn.
Thanks for listening. I’ll talk to you next week.
[music]
Transcription by CastingWords