Pipeliners Podcast host Russel Treat takes you deep into the weeds this week to connect Plato’s Allegory of The Cave to the API 1165 industry standard for pipeline control room displays, thanks to insight from pipeline consultant Charles Alday.
Listen for the latest information on the expected 2019 update to API 1165, the goal of the update to incorporate the latest technology to help pipeline controllers achieve deeper understanding of what they are seeing, the importance of supporting controllers for abnormal situations, and the latest industry statistics on the sources of abnormal situations.
Plato and API 1165: Show Notes, Links, and Insider Terms
- Charles Alday is a pipeline human factors consultant, specializing in control room management consulting. Charles has served on the API working group for alarm management and appeared on the Pipeliners Podcast to discuss Team Training.
- API 1165 is the Recommended Practice for Pipeline SCADA Displays. This standard outlines the best practices for designing and implementing displays that are used by pipeline controllers to evaluate information available in all operating conditions.
- The Abnormal Situation Consortium (ASM) is a group of industry-leading companies and universities involved in the research and development of tools and products that prevent, detect, and mitigate abnormal situations that affect process safety in the control room environment.
- HMI (Human Machine Interface) is the user interface that connects an operator to the controller in pipeline operations. High-performance HMI is the next level of taking available data and presenting it as information that is helpful to the controller to understand the present and future activity in the pipeline.
- High-Performance HMI extends the capabilities of SCADA in pipeline operations and complies with the ISA 101 requirement by providing an HMI philosophy, style guide, and design guide.
- Adequate Information is defined in the PHMSA CRM Rule (49 CFR § 195.446 (c)) as each operator must provide its controllers with the information, tools, processes, and procedures necessary for the controllers to carry out the roles and responsibilities the operator has defined.
- AOC (Abnormal Operating Condition) is defined by the CRM Rule (49 CFR Subpart 195.503) as a condition identified by a pipeline operator that may indicate a malfunction of a component or deviation from normal operations that may indicate a condition exceeding design limits or result in a hazard(s) to persons, property, or the environment.
- Alarm Management is the process of managing the alarming system in a pipeline operation by documenting the alarm rationalization process, assisting controller alarm response, and generating alarm reports that comply with the CRM Rule for control room management.
- EEMUA (Engineering Equipment and Materials Users Association) is an international membership organization that serves the users of engineering equipment and materials. Their mission is to attract engineers from a wide variety of industries across the world to share experiences, learn, and solve problems.
- EEMUA 191 is a European industry standard that provides comprehensive guidance on designing, managing, and procuring an effective alarm system. EEMUA 191 is primarily concerned with alarm systems provided for people operating industrial processes.
Plato and API 1165: Full Episode Transcript
Russel Treat: Welcome to the Pipeliners Podcast, episode 67, sponsored by EnerSys Corporation, providers of POEMS, the Pipeline Operations Excellence Management System, compliance, and operations software for the pipeline control center. Find out more about POEMS at EnerSysCorp.com.
[background music]
Announcer: The Pipeliners Podcast, where professionals, Bubba geeks, and industry insiders share their knowledge and experience about technology, projects, and pipeline operations. Now, your host, Russel Treat.
Russel: Thanks for listening to the Pipeliners Podcast. I appreciate you taking the time, and to show that appreciation, we are giving away a customized YETI tumbler to one listener each episode. This week, our winner is Mark Kincheloe. Congratulations, Mark, your YETI is on its way. To learn how you can win this signature prize pack, stick around until the end of the episode.
Once again, it’s Saturday morning, and I have been doing my reading and catching up. One of the things I read yesterday, or Thursday, maybe, was an article on LinkedIn that was posted by Charles Alday.
I would refer to Charles as a statesman of the pipeline control center. He’s been doing really good work, and working with many pipeline operators for many years. He puts out a lot of good information off his website, but this week, he put something out that really grabbed my attention.
The title of the article was “Allegory of the Control Room: Reality, Road Trip, and Relationships.” I’ve titled this episode “Plato and API 1165.” If you read that, you probably wonder: 1. “What the heck is Russel thinking this week?” 2. “Just how nerdy can we get?”
I’ll try to elaborate why I think this all makes sense. Charles starts his article off, and he talks about Plato’s “Allegory of the Cave.”
For you that don’t know what that is, and aren’t students of philosophy — I can’t say that I am, but I find this kind of thing really interesting — in Plato’s Allegory of the Cave, he draws this image where there’s a group of people that have been imprisoned in a cave since birth.
All the prisoners are chained, and they’re required to face a blank wall in front of them. They can’t turn to the left or right, move, or look around, basically forcing them to gaze at the wall. Behind them, there’s a fire, and there are people carrying objects and walking between the fire and the men’s backs.
What they see on the wall is a reflection of the people’s shadows as they walk through the firelight, and as the objects are carried through the firelight. The question is, what happens if one of the prisoners is freed?
If one prisoner is able to get out of the cave, see the people walking around the firelight, and then even more to the point, walk outside of the cave, go into the sunlight, and see the fullness of reality, then he returns to the cave and tries to explain to the others seated there what he’s seen, how well does that work?
You might ask the question, “What the heck does that have to do with the control room?” Then, particularly, “What does that have to do with API 1165?” Well, let’s go a little bit, and let’s talk about the regulatory requirement, and how API 1165’s brought into the regulatory framework around pipeline.
In the Control Room Management Rule in Section C, in 195.446, for the liquids guys, there’s an area of the Rule called Adequate Information. There’s a requirement that each operator must provide its pipeline controllers with the information tools, processes, and procedures necessary for the controllers to carry out their roles and responsibilities.
Then they list some things you have to do, implement 1165, where you have a SCADA system, conduct point to point verification, test and verify communications plans, test backups, and implement procedures for shift handover.
All this is about putting some governance around what people see on the screens, how they can rely on what’s on those screens, and how they talk to one another about what’s on those screens. In a modern 2019 way, what a pipeline controller is doing is similar to the slave in the cave, looking at the reflection of the shadows on the wall by the firelight.
A pipeline controller is a worker who is looking at a set of computer displays, and they’re making interpretations about what’s happening with the pipeline system based on what they see in those computer displays.
Certainly, they have some understanding that they’ve built about what that means, but that might be quite different than the experience you would gain if you physically went out into the field.
Now, in Charles Alday’s paper, or article that he put out on LinkedIn, he talks about a situation where the controllers, based on some design limitations and construction implementations of some piping, and what the controllers could do with the pumps under certain operating conditions, they could get some pretty significant cavitation in those pumps.
The SCADA displays didn’t reflect any of this cavitation. The point that Charles makes is that for the pipeline controllers, cavitation is just a word in a training guide, kind of like a shadow on the stone wall. They don’t actually know what that is.
I would assert, if you’ve never been out to a pipeline site, heard a pump cavitate, and see what’s going on with the pipes, you really don’t have an understanding of what that means. It’s only by seeing it, hearing it, feeling it that you understand it, like the prisoner that’s let out of the cave to see reality.
One of the things that Charles is making as a point is that there needs to be a connection to the reality of what’s happening beyond just what I see on the screens and what are words in a training manual. It has to become real.
I think that that’s really a great allegory. For those of us in the pipeline business, many of us might not be aware of the Abnormal Situation Management Consortium, the group of companies and universities involved in the process industries that jointly invest in research and development to create knowledge, tools, products, to help with the identification and mitigation of abnormal situations.
It’s a joint research and development consortium. It was founded and has largely been sponsored by Honeywell. They’ve done some really good work, particularly in the area of process automation and the controller interface, from an alarming and visualization standpoint.
I’m going to link up a bunch of resources, as we always do, on the show notes page on the Pipeliners Podcast website. One of the things I’m going to link up is a Powerpoint that’s available through the ASM that talks about effective automation to improve operator performance.
One of the things, having done a number of these High-Performance HMI projects, and moving controllers from black backgrounds and lots of colors to grayscale, limited use of color, and all of that, oftentimes, people don’t understand why are we making these changes.
Some of the foundational research that supports these changes has been done by the ASM. The stuff I’m going to talk us through over the next few minutes directly goes to this subject matter. The first thing is to define what is an abnormal situation.
Now, in the pipeline world, we talk about abnormal operating conditions. There’s some regulatory guidance around identifying and managing AOCs. In the pipelining world, when we say “abnormal,” we tend to have a bit of a…you know, this is a regulatory thing.
I’m going to come at this in this presentation more generically — what is an abnormal situation? How do you identify them, and what’s the value of dealing with them? The ASM — this is not the regulatory definition, it’s an ASM definition — they define an abnormal situation as “a process being disturbed and the control system cannot cope.”
You might think of that as you’ve got a pump that, for some reason, is overspeeding and putting out too much pressure. You’ve got some instrumentation that’s failing, and consequently, you’re operating in an overpressure situation. That might be an example.
Another example might be I’ve got remote communications, but my remote communications are down, so I have no visibility to the process. These could be abnormal situations. Something’s going on with the process, but I can’t use the control system to deal with it.
Consequently, what happens is the operator, or the pipeline controller, has to intervene to supplement the control system. They’ve got to look at things they can do upstream or downstream. They’ve got to get on the phone and talk to field personnel about manual intervention.
All of this type of activity is necessary to address an abnormal situation; it impacts profitability. One of the things that the ASM did in this study that I’m going to link up is they actually wanted to say, “To what extent can we actually improve things, and how might we do that?”
They start out, and they talk about the paradox of automation. I think this is particularly true. This study’s actually about 10 years old now. I think it’s probably a lot more true now than it was even 10 years ago.
First off, if you’ve been in the business 10 or 20 years, and you think about how we were using automation 10 or 20 years ago, versus how we’re using it now, there’s been some fairly big shifts. 10 or 20 years ago, we might be using automation to augment or supplement what we do.
Now, we use automation, we actually don’t have the head count. Most companies do not have the head count to manually operate their systems at full capacity. If they had to revert to manual operations for any kind of extended period of time, they’d have to bring in contractors, or they would have to scale back their operations.
We’ve become more and more reliant on automation, but automation comes with a paradox. That’s this, as my automation gets better, I tend to implement more sophisticated processes. If I’m doing things manually, I have to keep things pretty simple.
But, as I automate, I can optimize. I can do things a little tighter. I can get more out of my systems, but this requires me to put in more sophisticated operating approaches. As those approaches become more sophisticated, I create more opportunities for error.
As we identify those errors and we work to fix them, we tend to apply even more automation. I add more alarms, I put in a control valve, I do something else to add more automation. What happens here is that when things go wrong, people have difficulty intervening because of the complexity that’s being built in and the sophistication that’s being built into these automation systems.
To some degree, it’s a self-fulfilling prophecy. Automation leads to more automation. In the case of this particular study, the ASM looked at a process plant. They were trying to identify what goes on with automation.
If you think about when you build a facility, and you have a target operating condition, usually, what happens is, we’re ramping that facility up to the target operating condition. That’s typically 95 percent of capacity, something like that.
Then, over time, we begin to tweak things out, tune things out, make things better, optimize, improve, optimize, improve. We get to where we’re running more towards the actual limitations of the plant. It’s a good thing.
That’s really the purpose of automation. We’re doing more with less. What that means is, that it’s fairly easy for an abnormal situation to make a three to eight percent impact to capacity or profitability. They estimated that the loss of production due to abnormal situations was about $10 billion annually in the U.S. petrochemical.
What are the sources of abnormal situation? They could be broken down into about 40 percent equipment, about 20 percent process, and about 40 percent people. The equipment failures are often preventable. The process failures are mostly preventable, and the people failures are almost always preventable.
If you really take this and do some further analysis, and you look at the details, the bottom line here is, of the types of abnormal situations that occur, 90 percent of them are preventable. Now, it’s impossible to prevent everything, but this goes to the other kinds of aspirational goals that make sense.
We’re trying to move the needle that direction. Some of the common things that were identified related to the people failures were inability to detect and problem because it was buried in a ton of data. I think we’re probably more there now than we were 10 years ago, because we continue to create more and more data.
The requirement to make a quick intervention — versus a thoughtful, diligent intervention, just given the speed at which processes move and so forth — may not be able to communicate well. There’s all kinds of reasons for that.
That could get to the actual “what’s available to me in technology?” and also get to just how human beings communicate, being on the same page about language, meaning, and all those kinds of things. Then, being unable to do things in a consistent way, for whatever reason. All those things can lead to ineffectiveness in resolving abnormal situations.
The bottom line out of all this is that people aren’t inept, but people do have limitations. We’re not good at finding a small data anomaly in a very large set of data. When we’re working quickly and we’re responding, we don’t have the time to think through the consequences.
Also, we’re not as consistent as automation. We might not do things the same way every single time. We might be working towards the same result, but our mechanism of getting there might change from time to time. Of course, when things become stressful, our ability to think clearly and act consistently becomes even more constrained.
Taking this out a little further, the ASM set a couple of focuses. One was alarm management. I’m not going to talk about that in detail. There’s the EEMUA 191 guideline, which is similar to the API 1167 guideline. EEMUA’s more of a European standard.
Basically, the thing that came out of the ASM was the need to avoid alarm floods. In other words, clean it up, so I’m not having to go through a lot of alarms to find out what’s actually occurring and implement processes to make that happen.
Then, there were other recommendations about tools. More to the point, what I want to talk about is the HMI and that area of focus.
A couple of things that make us do things better. One is the ability to work off of checklists, so we can do things in a consistent way.
One of the things the ASM looked at is creating what might be called semi-automated procedures. This is the step, leaving the human to do the analytic process, but walking them through a consistent set of steps.
The other area of focus was to create effective operator display design. In the early 2000s, there was a project done where they built out both a standard prototypical HMI, and they built out a new, what we would now call, a High-Performance HMI.
There’s some graphics in the paper that I’ll link up that you can look at to get a sense of what those look like. The question is this. If I move from an old-school, black background, lots of colors, to a grayscale…
I will tell you that this is not just about grayscale. Grayscale’s part of it, but a much more important part of it is how you organize and present the information. What kind of efficiency gains can you get to? This is what they were trying to do with the research project and looking at this plant start-up.
They did 21 operators. They had two groups, 10 of which were trained on and experienced with the traditional style, and 11 which were trained on and experienced with the ASM style, which would now be referred to as the high-performance style.
They did the experiment in two phases. They did a pre-test, where they did questionnaires to assess work experience and did some sample console rounds, just to get a sense of the different operator populations.
Then, they did some scenarios using simulators, one starting in a normal state, and then seeing what happens as the state transitions to abnormal. Then, they measured the time to identify, react, and resolve the issue.
They had a total of eight different scenarios, of which four of those were eventually used to do the comparisons. On average, operators using the advanced interface detected the abnormal events 48 percent of the time before the alarm occurred.
Let me say that again. 48 percent of the time, they were able to see the abnormal situation before the alarm occurred. That was a 38 percent improvement over the traditional interface, comparing the two things side by side.
In terms of resolution of the issue once it was detected, the operators on the High-Performance HMI handled and corrected the abnormal situation 96 percent of the time. That was a 26 percent improvement from the traditional interface.
Those are very material improvements in performance, simply by modifying the HMI. The whole purpose of the HMI is to improve operator or controller performance. What we say when we’re doing projects is, the goal is that every HMI should allow the controller to see abnormal before the alarm, and there should be one and only one alarm for each abnormal situation.
Now, both of these goals are virtually impossible to attain. It’s like the triple zeros for safety. I could get there in some fashion, and it’s the only goal that makes sense. Again, I’ll link up the article.
Coming back to the article that Charles Alday wrote about the Plato allegory and the man in the cave, really, the High-Performance HMI, the whole goal of that is to try and allow the person in front of the console to be able to clearly connect in their mental model, in their mind, what they’re seeing on the screens to what’s actually occurring in the field.
In order that they can be more effective at doing their job, which is to operate safely, without incident, and to deliver on-spec and on-time, and to do that in a way that minimizes wear and tear on the equipment that they’re operating.
That all sounds easy enough, but in addition to this automation paradox I talked about earlier, I think in our world, there’s another challenge that the industry is working on. I was going to say “struggling with,” but it’s the, “Where are we in the life cycle of doing this?”
If you think about API 1165 in its current version, that standard was written in 2007. That’s 12 years ago, which is many lifetimes in the evolution of technology. It was written targeting glass CRT screens and their limitations in terms of the graphical examples.
It is a completely different day and age in terms of what you can do with current state-of-the-art and these high-performance LED and later generation kind of screens. There’s a lot more that we can do with the technology than we could do before.
Not the least of which is the capabilities, the graphics engines, and the graphics cards on the computers, and what they can do. If you think about what we can render for gaming, there’s a capability there for us to access.
You have to do it in a way that’s intelligent and appropriate for the job. Here’s what I think the current challenge is. There is a natural conflict between the role of the SCADA manager and the role of the pipeline control room manager.
When the SCADA manager’s looking at the technology, and they’re thinking about what their goals are, their goals are reliability, maintainability, and cost of operation of that infrastructure. They’re trying to get the most they can for the least dollar.
That lends itself to, to the extent I can work off of templates, to the extent I can keep things simple, to the extent I can keep what I’m doing well within the constraints of the technology I’m using, my goals as a SCADA manager are easier to accomplish.
On the other hand, the control room manager’s interest is, “I want the data the right way, named the right way, contextualized the right way, and I want the analytical tools available at my fingertips without a whole lot of extra hunting to get what I need.”
Said another way, “I want a rich and purpose-designed HMI.” That might not be simple and easy to maintain. The challenge is, how do you strike a balance between these two things as you implement the new standards?
I’ve written a couple of blogs — I’ll link those up as well — about API 1165, and about what’s going on with new SCADA technology. I think we’re going to continue to see a lot of change. In fact, to some degree, it’s very difficult just to stay current on everything that’s changing.
However, that being said, there is a new version of API 1165 soon to come out. The working group that’s working on revising the standard is trying to get the document ready for distribution or ballot. I think the stated goal is to have the new standard out by the end of the year.
The new standard is going to reference some of the stuff I was talking about in the ASM and is going to be more, in terms of recommended practice, leaning towards grayscale and some of these things I’m alluding to.
I think the thing that I would ask the listener to take away, that might be actually getting involved in a project to implement the High-Performance HMI at their business is simply this. Remember, it’s not just about technology, and it’s not just about grayscale.
What it’s really about is delivering to those people sitting in the cave, those people sitting in Plato’s cave, the best representation of reality that you’re able to deliver to them, so that they can do their job, so that they can see abnormal before they get alarm, so that they can be effective in addressing the abnormal situation, before it becomes something worse.
I hope you enjoyed this week’s episode of the Pipeliners Podcast and our conversation about API 1165 and Plato. Hopefully, now, you understand what Plato has to do with API 1165.
Just a reminder before you go. You should register to win our customized Pipeliners Podcast YETI tumbler. Simply visit pipelinepodcastnetwork.com/win to enter yourself in the drawing.
If you would like to support this podcast, please leave a review on Apple Podcast, Google Play, or your smart device podcast. You can find instructions at pipelinepodcastnetwork.com.
[background music]
Russel: Finally, if you have ideas, questions, or topics you’d be interested in, let me know on the Contact us page at pipelinepodcastnetwork.com, or reach out to me on LinkedIn. Thanks for listening. I’ll talk to you next week.
Transcription by CastingWords