This week’s Pipeliners Podcast episode features Jan Hayes discussing her book, Nightmare Pipeline Failures, how you must learn from both successes and difficulties, and that large incidents are not necessarily task failures, but system failures.
In this episode, you will learn how sociology plays a large part in organizational accident prevention, what the term “system” means, how to prevent organizational breakdowns or contributions to incidents, and ways pipeline safety goes broader than just protecting the pipeliners.
Contribution of Organization Factors in Pipeline Incidents: Show Notes, Links, and Insider Terms
- Jan Hayes is the author of “Nightmare Pipeline Failures” and a Professor at RMIT University in Melbourne, Australia. Connect with Jan here.
- Nightmare Pipeline Failures is a collection of pipeline failures that have occurred in the United States, going into detail what went wrong and how it could have been prevented.
- RMIT is a world leader in Art and Design; Architecture; Education; Engineering; Development; Computer Science and Information Systems; Business and Management; and Communication and Media Studies.
- Future Fuels Cooperative Research Centre is the industry focussed Research, Development & Demonstration (RD&D) partnership enabling the decarbonisation of Australia’s energy networks.
- The PRCI (Pipeline Research Council International) is the preeminent global collaborative research development organization of, by, and for the energy pipeline industry. [Read more about the PRCI collaborative research projects, papers, and presentations.]
- European Pipeline Research Group (EPRG) is a registered association of European pipe manufacturers, pipeline operators, installation contractors and service providing companies that are active in the field of pipeline safety.
- Swiss Cheese Model is used to illustrate how analyses of major accidents and catastrophic systems failures tend to reveal multiple, smaller failures leading up to the actual hazard. In the model, each slice of cheese represents a safety barrier or precaution relevant to a particular hazard.
- ASAP report (Aviation Safety Action Program report): The goal of the Aviation Safety Action Program (ASAP) is to enhance aviation safety through the prevention of accidents and incidents. Its focus is to encourage voluntary reporting of safety issues and events that come to the attention of employees of certain certificate holders.
- Check out the episode Russel mentions about airline safety here.
Contribution of Organization Factors in Pipeline Incidents: Full Episode Transcript
Russel Treat: Welcome to the “Pipeliners Podcast”, episode 255, sponsored by Gas Certification Institute, providing standard operating procedures, training, and software tools for custody transfer, measurement, and field operations. Find out more about GCI at gascertification.com.
Announcer: The Pipeliners Podcast where professionals, Bubba geeks, and industry insiders share their knowledge and experience about technology, projects, and pipeline operations. Now your host, Russel Treat.
Russel: Thanks for listening to the Pipeliners Podcast. I appreciate you taking the time. To show that appreciation, we give away a customized Yeti tumbler to one listener every episode. This week our winner is Buzz Fan with TRC. Congratulations, Buzz. Your YETI’s on its way.
To learn how you can win this signature prize, stick around till the end of the episode. This week, Jan Hayes from RMIT University in Melbourne, Australia and author of the book “Nightmare Pipeline Failures” joins us to talk about the contribution of organizational factors in pipeline incidents.
I’ve got to tell you I have really been looking forward to this conversation. I got your book, “Nightmare Pipeline Failures,” and read your book. I marked all over it and reread portions of it. I decided that I had to reach out and get you on the podcast. Thanks so much for agreeing to be here.
Jan Hayes: You’re very welcome. I’m so pleased that you found the book so compelling.
Russel: Maybe the best way to start is for you to tell us a little bit about who you are and your background and how you came to write a book on pipeline failures.
Jan: Sure. Thanks, Russel. I guess you can probably tell from my accent, I’m Australian. I’ve spent nearly 40 years in the field of accident prevention generally. I started my career working in offshore oil and gas production for ExxonMobil back in the ’80s.
After the Piper Alpha accident offshore, I moved really into the safety-related field. At that stage, I was working as an engineer. It became clear to me after a few years that really the improvements that we were looking for in safety were linked at least as much to the organizational side of how we do our business as they were to the really detailed technical aspects.
In my career, I felt like I could only do so much fire and explosion modeling. I really wanted to understand what was driving the decisions that were made by people at all levels of organizations. I made a move from engineering into sociology and eventually also a move from industry into academia. These days, I’m a professor at RMIT University here in Melbourne, in Australia.
For more than a decade, I’ve been working with the Australian pipeline industry, supporting the industry here in a research program. At the moment, it’s structured through something called the Future Fuels CRC. We collaborate with and work closely with internationals, PRCI and EPRG, but fundamentally, I’m supporting the Australian industry here in their efforts to operate safely.
Russel: Fascinating. I want you to give me a definition of what is sociology.
Russel: I know what engineering is. What is sociology? How does an engineer become a sociologist? That’s a fascinating conversation.
Jan: That’s not the conversation I thought we were having today. Sociology is about how people operate in society. It’s different to psychology, which is focused very much on people as individuals and what drives them. Sociology is more about how groups of people operate or how we operate as individuals within a social setting.
As what I would now say is an organizational sociologist, I’m looking at how organizations impact people. Organizations really are just made up of people, but they take on an extra characteristic when you’ve got all those people working together. Maybe that’s a bit of an off-the-top-of-the-head explanation.
Russel: No. It’s really ideal. It’s a great tee-up for the conversation I wanted to have today. You’re absolutely right. Certainly, as I’ve advanced in my career as an engineer, I get more and more interested in the people dynamics. Engineering, at this point in my life, is pretty easy. The people part [laughs] is never easy. The people part always gets more challenging.
How does that sociology, how people behave in groups and that study, how does that play into your studies and your book that you wrote about pipeline incidents?
Jan: To take this back to thinking about organizational accident prevention, which is how I describe what I do in the highest level, this is about thinking about how the decisions made at various levels of organizations impact safety outcomes. That’s it at the simplest level.
Russel: That sounds easy.
Russel: I suspect it’s not, though. I took several things out of your book that I think I’ll probably always have top of mind whenever I’m having a safety conversation.
One of them was the idea that lessons learned as an approach for improving safety, while it has a value proposition, it has some pretty significant limitations. What’s your takeaway in terms of the limitations of lessons learned as an approach to improve safety?
Jan: That’s a really interesting and somewhat complex question. Let’s break it down a bit. It’s super, super important to learn from things that have gone wrong and things that have gone well. That operates at a micro day-to-day level. It’s super, super important, and very, very useful.
The problem comes if we’re trying to think about preventing low-frequency but high-consequence accidents, these iconic pipeline failures that I wrote about in the book. When we’re talking about trying to prevent accidents like that, you can’t use an approach of…
Lessons learned are effectively trial-and-error learning. You try it, you see what happens. If it doesn’t work, you make an adjustment. You treat life as a trial-and-error learning exercise so that if you see something go wrong, you think, “Next time, I’ll do it differently.”
That’s fine, but you can’t take that approach when you’re trying to prevent these major disasters, because the first time it goes wrong could be the end of your company if it sends you broke, or whatever. It certainly could be catastrophic for individuals if they’re injured or they die.
You can’t take that trial-and-error, suck-it-and-see kind of approach to major hazard management. For that, you need to take different approaches. Although, having said that, learning from small problems and small failures is a super important part of that. There’s a whole body of knowledge out there that some of your listeners might know about regarding high-reliability organizations.
One of the features of a highly reliable organization is that they seek out all the little things that go wrong, the little mistakes and faults and failures, and they make the most of learning from those. They make the most of seeing where that might be highlighting weaknesses in the overall system.
If we’re having a lot of faults and failures or a lot of errors in a certain task or in a certain aspect of the business, that maybe shows us we’ve got a weakness there that needs to be addressed to make the entire system more resilient.
Russel: One of the gentlemen I’ve had on the podcast many times, a guy named Steve Allen, likes to tell a story of being on an airplane and the flight got delayed because they fixed a leaky water faucet in the lavatory. It’s one of those things that, does that affect the safety of the flight? Probably not. Does it affect the safety of the system if you ignore it? Definitely.
That’s a pretty big mental leap. It’s a pretty big mental leap. The other thing, you used a set of terminology that you use in your book, and you spent a fair amount of time talking about this, but the idea of very-low-likelihood, very-high-consequence events.
That puts a different frame on how we think about these things. We’re used to, in operating critical infrastructure, having things happen and they don’t happen the way we planned. They go all abnormal, or there’s something that happens, but the consequences are very small.
While those things aren’t usual, they occur from time to time, and we do something, we learn from them. The point you’re making is that learning does not necessarily help you when it comes to these things that are very low likelihood, very high consequence like a major pipeline incident, isn’t it?
Jan: They help you in a different way. In their investigation into one of the major incidents, there’s a piece in there in the evidence where the company reps are being interviewed, and someone talks about repairs or small incidents.
They say, “There were some leaks, but we repaired them.” It’s like, “We repair them, and then we forget about them.” It is true that having a problem and repairing it is important, but it’s also important not to forget about them.
To have perhaps a different group of people whose job is to stand back and look at all of these things that are happening, and look for trends and see what this might tell us about the system as a whole. That might sound simple from an engineering perspective, but it requires a different mindset because what it means is highly reliable organizations value these kinds of small faults and failures for the insights that it potentially brings them on the state of the system.
For example, instead of seeing maintenance as a cost to be minimized, you see maintenance as a really important source of insight about how the facilities are running and how our systems are working. It’s a different mindset towards those small problems. Small problems become things that we’re grateful for what they can tell us rather than, “Seriously? We’ve got to fix this again.”
Russel: You said a whole lot there. You said a whole lot because that is a very different way of thinking. It’s a little…Gosh, I get twisted up and my brain starts to race when I start thinking about these things. I get excited about it.
You keep talking about the system, the system, so maybe it might be helpful to give a little definition of what is the system? What do you mean when you say the system?
Jan: When I say the system, I’m talking about not just the technology but the people and the systems of work that go together to make up the entire organization that has a particular primary task of transporting gas or whatever it is.
Russel: Let me try this on. If I’m thinking about the system, then it’s my work process management, my people, their training, their tools, their mindsets, and then the work I’m doing. I’m just trying to paraphrase what you’re saying to make sure I understand it. We tend to focus on the work versus all the things that go into getting the work done.
The nature, again, I’m trying to make sure I understood what you wrote in your book, but the nature of these very-low-likelihood, very-high-consequence events is they’re generally a failure in the system. Not a failure in the work. It’s only by having people looking at the system that you can find that stuff.
Jan: Yeah, for sure. The other unique characteristic, if I can just jump in there, there’s one other unique characteristic of this. It’s important to see all these little faults and failures for what they tell you. The other issue is that when we think about these major disasters when we look at what’s happened, it happened on a particular day because it was triggered by something.
The whole sequence of events that meant that some small, individual thing that someone did, or some small failure of the piece of equipment, the thing that links that to the major disaster, is caused by all this accumulation of other little faults and flaws in the system that are sitting there as what we might call latent failures.
They’re not things that happened on that day. They’re things that have been there for days, weeks, months, maybe even decades. The challenge is to try and find these things and see these trends. It doesn’t have to be just in real time what’s happened this week.
Some of these things are latent problems in the system that have been there for decades. There isn’t a sense of urgency, it’s today’s problem. There’s a sense of understanding deeply what it is we do and what the state of this system is in the longer term.
Russel: I think there’s a lot bound up in that statement, Jan. A lot. It also goes to, one of the other things you talk about in your book is the limitations of risk management as an approach to prioritize your work.
Because, again, that looks at the work that needs to be done. It doesn’t look at the system that’s used to perform the work. It doesn’t necessarily look at these latent issues. Am I getting that right or do you want to correct what I’m saying?
Jan: Risk management is a very broad term and can be done in very many ways. Risk is certainly an incredibly useful concept to help us prioritize what needs to be done.
The thing that we’ve got to be very careful of in this context of preventing major disasters is whether you’ve got some feedback loop between the assumptions that you’re making in your risk management processes, the conclusions that you’re drawing, and what’s really happening out there in the field.
Because when we’re talking about pipelines, it’s buried infrastructure. You can’t see it. Just because you think nothing’s happened doesn’t mean that the buried infrastructure is in fantastic condition. There’s always tomorrow’s accident.
You’ve got to think very carefully about how you do a reality check on the results and the conclusions that you’re drawing from your risk management processes. This is another way that you can link back in the small faults and the repairs to the system.
Like, “Did we actually predict that we were going to have this problem? Did the model predict this? If our model didn’t predict it, how reliable is this model that we might be relying on for making some really important decisions?
Russel: That’s a really interesting perspective. I recently had a conversation with one of my employees. We were working on something and we were doing some analysis about some scope around some regulatory analysis. I was taking time to stop and underscore the assumptions we were making.
Then talking to her about that it wasn’t so much the decisions we were making that were important. What was important is that we understood the assumptions we were making and that we reevaluated our assumptions. Then, that we were aware when something changed, that changed our assumptions.
Jan: Risk management is truly like the classic case of garbage in, garbage out.
Russel: That’s right. You have to check your assumptions, you’ve got to check your data, all that kind of stuff, to make sure you’re getting good stuff. How do these organizational factors around the system, how do they contribute? What’s the nature of organizational factors?
Jan: One of the most common and most well known models of organizational accidents is put together by a professor of psychology in the UK called James Reason a number of years ago, and it’s called the Swiss cheese model. Lots of people might be familiar with that.
The idea behind this Swiss cheese model is that we put multiple layers of defenses, or multiple slices of cheese in place, to prevent these small faults and failures from leading to a major disaster. The problem is they are like slices of Swiss cheese. They’re never perfect. They always have problems with them.
Because everything we do in life is never perfect. The problem comes in terms of having a major disaster when, in the jargon of the model, the holes in the Swiss cheese lineup. So then some small thing becomes a major thing because all the defenses have failed.
The challenge is to find these holes in the Swiss cheese. The kinds of things that influence the holes in the Swiss cheese are these organizational factors. It’s things like the way work is organized, the budgets that we’ve got to spend on things, prioritization.
All these kinds of issues that can then impact the system as a whole. They impact whether or not people make mistakes in the workplace, but they also impact these latent failures that are inherent in the slices of Swiss cheese.
This directs our attention away from the workplace and higher up into higher levels of organization. What are the organizational priorities? How are risks and rewards? How are reward systems structured in organizations? Who gets promoted? How are you paid? Where do the bonuses come from?
How is the organization structured? Who’s responsible for what, who has what job and how does that work? All those kinds of more organizational things that traditionally people might have thought actually are quite removed from safety.
In fact, they’re so inherent in the way organizations function, they have a direct impact on safety outcomes.
Russel: I got to try and process what you just said. I’m going to try to use a specific example that is in a domain I’m familiar with. There’s an issue of who you’re hiring, what training you’re providing them, and then what oversight exists to help them get better. Those are all factors. They exist throughout the organization.
They exist at the field technician, the dig crews, the engineers, the integrity guys. It exists in all those places. Who am I hiring? How am I training them? What systems do I have in place to help them get better? Those are all organizational things.
Then there’s another aspect of this, where somebody is making decisions about how do I hire, what training I provide, is the training current, is the training adequate, has the training been received and integrated into the person and their capabilities. All of those things are part of the system.
Jan: They are.
Russel: That’s a small piece. That’s not everything. That’s just a small piece.
Jan: Right, but you’re still talking, I think, about people at the pointy end. Whereas I’m saying those things are important, absolutely, but I’m also saying there are things away from that pointy end of the organization that are just as important.
Things like, if you look at the organization chart of a typical organization, there’s now an increasing trend to have representation regarding safety at the highest level. For example, if you’ve got your technician down at the bottom who’s got some kind of safety-related issue and they want to report that it’s to do with major accident prevention. There’s a quality problem on something. They report it.
You want a reporting line that effectively allows things like that to be escalated significantly, right up in the organization, so that you don’t have that kind of distributed organization structure that means that the operations manager, for example, who’s got a whole lot of other priorities, including meeting today’s, this week’s targets, having them responsible for long-term safety issues so they can say, “I’m sorry. Park that quality thing. We’ve got to meet today’s target.”
You’ve got to have some organizational visibility of these longer term issues that can make its way higher in the organization so that decisions about these kinds of trade-offs ultimately are being made at the highest level.
Otherwise, you can have people at the top of the organization who don’t know about what the problems are at the bottom. How could we then criticize them for not taking these things into account, if no one’s told them?
Russel: That’s really right. That’s kind of what I was driving to, Jan. Even in the example I was laying out, you can start looking at higher levels in the organization, about where they’re putting together the budgets for the training and defining the training and then at even higher levels where they’re looking at what staffing do we need in order to execute our work plans.
You’re not actually getting more abstract when you move that direction. It’s just a different way of thinking about what the system is.
Jan: Yes, absolutely.
Russel: What kind of things do you do to prevent the kinds of organizational breakdowns or organizational contribution to incidents? What are the kinds of things you need to be practicing?
Jan: One of the big things is what I was just alluding to there. What we’ve seen in the last decade is a real trend towards awareness of the potential for major disaster at the most senior levels of organizations.
There’s a real thing happening around the world at the moment in courses for board members to make sure board members understand what can go wrong in the companies for which they are directors. Without having that understanding, they can make far, broad-reaching decisions without understanding the potential consequences. Awareness starts right there.
It also includes having, as I said, an organizational structure so that someone’s coming up to executive level who is keeping an eye on all of this stuff and who has systems for reporting upwards trends in all of these broader, longer term issues.
In the context of pipelines, I guess we’re talking significantly here about integrity management issues, that kind of stuff being reported up, not simply coming through an operational line where they get to say, “We’ve met our targets for production. It’s fantastic,” but having these other long term visions as well.
The other aspect to make that kind of system work links back to what I was talking about before and the attitude towards failures. We need to get people who are supervisory managerial at all levels of organizations to think that someone who comes to them with what might be traditionally seen as bad news is actually doing them a favor because they’re highlighting something that can then be fixed.
Doesn’t mean we want to create an organization of whiners that say, “We can’t do this. It’s all doom and gloom.”
On the other hand, taking problems seriously and being open to hearing that from staff, rather than giving off the attitude that everything’s got to be fantastic and that what I want to hear from you, that everything is fantastic. If you tell me anything that it’s not fantastic, then that’s on you rather than on the organization.
Russel: That’s right. What’s coming up for me as I’m listening to this is I had a guy on the podcast who’s an airline pilot. We talked about the airline Safety Action Program and how something like that might be implemented in pipelining. I know a number of companies do this, but we don’t really have anything that’s industrywide.
It does exactly that. Every one of the various specialties, pilots, mechanics, they all have a mechanism for reporting things they think need to be reported. Then there’s a process where those things get analyzed. There’s a feedback loop and all that kind of stuff. That is a big part of aviation. It’s a big part of aviation.
I’m saying this. I believe we need to get there as a pipeline industry. We’ve got a ways to go. We’ve got a ways to go.
Jan: The whole attitude towards small faults and failures in the aviation industry is quite different. I’ve done quite a lot of work with air traffic control over the years. Air traffic controllers, if they’ve made a mistake in communicating with an aircraft on the shift, they will go and report themselves.
They will go to the boss at the end of the shift and say, “Hey, there was this problem. I made a mistake. Let’s think about why I made this mistake.” It can be from the perspective of fatigue management or whatever, or it can be there’s some issue in the system that meant that I wasn’t supported to make the best decision.
There’s no sense in which the boss says, “Shame on you for making a mistake.” It’s like, “Thanks for telling me and let’s get this sorted.”
Russel: It’s more, “Thanks for telling me. I just want to let you know that there’s two other people that have had a similar problem in the last three weeks, so clearly, we’re going to need to take a deeper look at this.”
It’s just a different way of thinking. Again, this airline pilot uses examples, it’s like I’m driving and I run a stop sign. The first thing I do is I pull into the next police station and I turn myself in for running a stop sign.
The police take a report and they’re asking, “Was there a stop sign there? Was it clear? Was there shrubbery? What was the weather? Were you listening to the radio? Were you talking on your phone?” asking all these other questions.
Then at the end of that, they go, “Thank you for reporting it. By the way, we’ve had six other people run that stop sign in the last two weeks. Thank you for bringing this to our attention.” It’s a radically different way of thinking about mining for these small things that could be indications of more endemic system issues.
Jan: Right, because running a stop sign is seen as the driver’s problem, and the driver should be punished. Hence the police analogy that you’re pulling in there.
In air traffic control, an air traffic controller making a mistake is, in the normal scheme of things, obviously, there are exceptions, but in the normal scheme of things, that’s seen as a system problem. Not a problem with that individual air traffic controller.
Russel: The first assumption is it’s a system problem.
Russel: It’s interesting. If you were going to try and wrap this conversation up, what would you want pipeliners to take away? Everybody should remember this one or two key things about this conversation. How would you wind this up?
Jan: Wow, let me think about that.
Russel: I’ll make a stab at it. I’ll take a stab at it. Fundamentally – and this is one of the things I took away from reading your book and I found really compelling – is fundamentally, we do not have task failures.
When we have a major incident, we have a system failure. That’s the first thing I would take away. The second thing I would take away is that if I’m a pipeline operator, I need to find a way to look at the system and evaluate the system.
That requires I’ve got to define the system, but it also requires that I have a mechanism or a capability to constantly be looking at the system. Those are the two things I took away.
Those things apply to what we’re trying to do with our company and our software is trying to help people create that capacity because there’s a lot that goes into what data do I collect? How do I collect it? Where do I collect it from? All of that allows me to look at my system.
Jan: I’ve got a couple of others for you. I don’t disagree. Those points that you made are a nice summary from one perspective. I’ve got a couple of others.
First of all, inherent in the conversation we’re having is that safety isn’t just about slips, trips, and falls and the worker hurting themselves. Inherent in this whole conversation is that we’re talking about safety of the general public, and effectively, integrity of the pipeline.
Sometimes, we think about integrity management as a very technical thing rather than linking it through to the potential ultimate consequences. I would ask all of our pipeline organization friends to remember that safety isn’t just about making workers wear safety boots and hard hats, making sure people are not electrocuted. It’s got this broader dimension that’s linked directly to pipeline integrity.
Then, with that in hand, and thinking about how the system is so interconnected, I would ask the pipeline industry workers going to work every day, think about how what they’re doing links to these ultimately good safety outcomes that we’re after.
Because just about everybody who works in the industry, what they do has some link ultimately to the overall integrity of the system and these good outcomes. I would ask people not to forget that.
Russel: Jan, you guys can’t see me. I’m doing shooting fingers at Jan over the video as we’re talking because I think that’s right on point. Sometimes people even in the purchasing office don’t realize how important what they’re doing is for safety.
Jan: Absolutely important. Purchasing is massively important.
Russel: There’s all these kind of things that we typically think of back-office functions. They’re critically important to safety.
Russel: For the listeners, we’re wrapping up this first episode. Jan is going to come back next week and we’re going to have another conversation about another concept that I found quite compelling. It goes to how the regulatory framework in Australia and how it’s different from the regulatory framework in the US and why.
Please come back and listen to that one if you listen to this one. I’ll also just remind the listeners that we will link up – obviously, Jan’s contact information will be on the website – but we’re also going to link up some of these resources we’re talking about.
We’ll give you a link so you can find her book because I think anybody who works in pipelining, they ought to read Jan’s book. It’s that good. Jan’s given me a bit of a face over the video feed here but I do believe that. It came highly recommended to me and I mentioned it to some of the other people I work with. They all bought it and they’re reading it. It’s really good. It’s really good.
Jan: Thank you so much, Russel. I’ve enjoyed the conversation. It’s been great. Hopefully, the listeners can take something from it.
Russel: Yes, that’s great. I look forward to picking it up again.
Russel: I hope you enjoyed this week’s episode of “The Pipeliners Podcast” and our conversation with Jan. Just a reminder before you go, you should register to win our customized Pipeliners Podcast YETI tumbler. Simply visit PipelinePodcastNetwork.com/Win and enter yourself in the drawing.
If you’d like to support this podcast, please leave us a review. You can do that wherever you happen to listen. Apple Podcast, Google Play, Stitcher, and many others. You can find instructions at PipelinePodcastNetwork.com.
Russel: If you have ideas, questions, or topics you’d be interested in, please let me know on the Contact Us page at PipelinePodcastNetwork.com or reach out to me on LinkedIn. Thanks for listening. I’ll talk to you next week.
Transcription by CastingWords