NEWLY ADDED: Read or Watch Russel Treat’s full presentation on SCADA systems from the ENTELEC conference:
How much data is enough? This is a common question when evaluating your data acquisition capabilities and the individual needs of your pipeline operation, particularly when considering the Computational Pipeline Model (CPM) approach to leak detection.
However, there is a better question to ask: what am I trying to do, what are the limitations of my SCADA infrastructure, and how much data do I need to accomplish each critical function?
In this episode of the Pipeliners Podcast, host Russel Treat addresses the importance of knowing what you need the data for so that you can decide whether to gather a high volume of data or more detailed, accurate information from your instruments in order to improve performance of your leak detection system.
The episode also covers the critical area of timestamping data so that pipeline controllers can read the data correctly, achieve situational awareness, and react appropriately. If you or your team deals directly with SCADA systems, this episode is for you!
SCADA Systems: Show Notes, Links, and Insider Terms
- SCADA breaks down into two key functions: supervisory control and data acquisition. Included is managing the field, communication, and control room technology components that send and receive valuable data, allowing users to respond to the data. (To learn more about SCADA implementations, listen to Pipeliners Podcast Episode 11.)
- The PHMSA Pipeline Safety Act (49 CFR Part 192 and 195) defines the requirements to safely transport natural gas and liquids, respectively, through a pipeline.
- Sarbanes Oxley was a regulatory act introduced by two U.S. Senators in 2002. SOX is designed to govern the measurement of data to ensure accurate reporting to stakeholders. (To learn more about SOX and measurement standard operating procedures, listen to Pipeliners Podcast Episode 13.)
- When evaluating the capabilities of your SCADA systems for transporting data, you need to consider which method of communication best fits your operation.
- Satellite communication can have a natural delay from the time a data request is made and the time the data is sent.
- Poll Response only sends data if requested by the host. This creates limitations because you may need to wait up to 15 minutes for the full data package to be received.
- Report by Exception sends data to the host when there is a change to the data.
- MQTT (Message Queuing Telemetry Transport) is a publish/subscribe protocol that allows data to move quickly and does not bog down the system with unnecessary requests.
- Modbus is an older protocol that enables communication among many devices connected to the same network. The drawback is delays in the communication, oftentimes creating timestamp discrepancies.
- Flow monitoring is an analysis tool that provides a detailed view of the data to and from machines. When flow monitoring is enabled, its output defines which machines are exchanging data and over which application.
- API 1130 defines the requirements for leak detection in pipeline operations.
- API 1130 Figure C-1 (page 31) shows how a CPM (computational pipeline monitoring) system should use instrument data.
- API 1130 Section 5.1.1 (page 12) defines how to match instruments, their ranges, and their specifications to the pipeline operation design. This helps identify the uncertainty in a system to ensure the accuracy and reliability of the instrument delivering data.
SCADA Systems – How Much Data is Enough?: Episode Transcript
Russel Treat: Welcome to the Pipeliners Podcast, episode 15.
[music]
Announcer: The Pipeliners Podcast, where professionals, Bubba geeks, and industry insiders share their knowledge and experience about technology, projects, and pipeline operations. Now your host, Russel Treat.
Russel: Thanks for listening to the Pipeliners Podcast. We appreciate you taking the time. To show our appreciation, we are giving away a customized YETI tumbler to one listener each episode. This week, our winner is Timothy Pasternak with Dominion. Congratulations, Timothy, and your YETI will be heading out in the mail directly. To learn how you can win this signature prize pack, stick around to the end of the episode.
I want to start this episode by doing a shoutout to listeners. Had a question from Randy Herman asking about what is a Bubba geek. If you listen to the intro for the show, you’ll hear that term used.
First off, just let me say I consider myself a Bubba geek. This is a term of endearment and affection for those of us that work in oil and gas, drive around in four-wheel drive pickup trucks, like technology — many of us like things that you might think nerds like. At the same time, we probably also like those things that rednecks like, like fishing, and hunting, and being outdoors.
It’s actually one of the things that’s nice about being in the pipeline business. You do get the opportunity to see some beautiful country. That’s what a Bubba geek is, and another way to say that might be a nerd redneck. Randy, thanks for the question, and I hope that clarifies things.
Also, I’d like to shout out to several listeners. I’m not going to call them out by name, but I had several people reach out and ask a question as a follow-up to the last episode with Giancarlo Milano from Atmos International, asking how much data is enough.
I get asked this question frequently so I thought what I’d do for this episode is something maybe a little different. I’m going to dust off a presentation I made at the 2015 ENTELEC Fall Conference that was on this very subject.
Some people would ask the question how much data do I need from a regulatory standpoint, or they might ask what are the implications of operating under the various regulatory requirements for control room management leak detection among others that would tell me how much data would I need to get.
I think the first thing you need to do is you need to talk a little bit about what kinds of system are you getting the data for? Because that matters, and it matters in a material way.
The first thing to talk about is a little bit about SCADA. Obviously, that’s an industry term and most of us are familiar with what SCADA is, but if I’m going to break that down, SCADA is two things. SCADA is data acquisition, and that relates to the data in the field and all the telemetry and software and hardware used to get the data from the field back to a computer system.
It’s also supervisory control. Supervisory control — that’s kind of a key term. In SCADA systems, which are used in pipelining amongst many other industries, supervisory control means I’m monitoring and I’m passing set points to the field.
But the control, the actual logic that’s controlling pressure or controlling temperature or controlling flow, that logic actually exists in the programmable logic controllers in the field at the field location. That’s why we call this supervisory control and data acquisition, or SCADA for short.
SCADA is really a stack of technology. There’s a lot of things that go into it. At the very bottom level, it starts with instrumentation — pressure gauges, temperature gauges, flow instruments on the head of meters, and so forth.
Above that, we would have the PLCs or programmable logic controllers and the RTUs, the remote terminal units, and flow computers that take the signals and perform math and logic to determine flow and control points, and so forth.
Behind that, or on top of that in the stack, you’d have communications. That can be field radios, cellular systems, satellite, telephone lines, and many other kinds of modes of communications — basically, any physical hardware that’s moving the data.
Beyond that, we have what historically has been referred to as the SCADA host. That’s getting in today’s world a little bit less clear, because the communications server and the SCADA host are often two separate pieces of software and can exist on separate boxes. I’ll talk a little bit about that a little later in the episode.
Then of course, on top of that, you have the applications. The cost of building out this technology stack starts in the field. I’ve got to put these programmable logic controllers and instrumentation and power for all that at the field. There’s a big investment in that. Then, I have to have my communications network. There’s a big investment there.
As I climb up this stack of components, my incremental cost gets less. My value gets increasingly more, because particularly once I get past the SCADA host and I start looking at applications like leak detection, and analytics, and engineering, and so forth, that can analyze this data and use it to support decision making, that data starts becoming very valuable.
I can’t get there if I don’t first make the spend on the full stack.
You might ask a follow-up question: so all this data that’s coming up through the SCADA system, who all needs it and what do they need it for?
Most commonly, we think about this data being used in the control room. I’m monitoring pressures. I’m monitoring flows. I might be monitoring analysis. I might be monitoring H2S. I’m monitoring what’s going on in the field, and I’m making decisions, and I’m taking action to communicate with others to respond to what’s happening in the field. That’s the first and most common use of SCADA.
We also collect data and pass it to the measurement group. All of the information that measurement needs to perform its accounting and verification processes typically is collected through the same communications infrastructure that’s collecting data for SCADA.
I might also be passing this data to some kind of CPM or computational pipeline model that’s taking the data, and performing calculations, and using those calculations to make an assessment of whether or not I might have a leak in the pipeline.
I might also be gathering this data and passing it to some kind of historian or repository where that data’s used by engineering. It might be used by financial groups for developing models to forecast. I might be sending it to a facilities management system or machine management system in order to gather analytics to determine what level of ongoing support and service of that equipment might be justified.
Obviously, there’s a lot of people using this data. All of this goes into the conversation of about how much data do I need, because each of these users really need this data contextualized appropriately for their specific use.
I talked a little bit about how much data do I need from a regulatory standpoint. There’s two regulation bodies that would govern the collection and use of data. One is all the 49 CFR Part 192 and 195, which is the Pipeline Safety Act, 192 for gas and 195 for liquids. There are aspects of 195 — there’s leak detection requirements — and they call out API 1130 as the standard to be used.
Then likewise, you also have the Control Room Management Rule, and that rule you have call out specific to adequate information and alarm management.
In addition to the Pipeline Safety Act, there’s also Sarbanes Oxley, which Ardis Bartle talked a little bit about in a recent episode, which would primarily govern measurement.
Sarbanes Oxley is primarily a financial rule. Measurement data that’s created in the field by connecting up instruments, bringing that data into flow computers, and performing calculations — all that information is going through accounting is ultimately directly impacting the financial so Sarbanes Oxley governs that.
Even though we have these regulatory requirements and we have a number of standards that are incorporated by reference, we really don’t have anything that prescribes the amount of data that you need to do any of these tasks. That’s all left up to the judgment of the operator.
When I start looking at how much data do I need, I need to start thinking about what’s the purpose of the data I’m collecting and then what are the types of needs that are related to those purposes. Let me try to unpack that a little bit.
First off, I have prevention through systems. By that, I mean things like leak detection and alarming. Alarming might not be a lot of information, but I need to pick it up in a way that I can get the alarm, and the alarm will be timely. You have to ask yourself questions like if I’m monitoring pressures, how quick can pressures change, and how do I want that to alarm.
There’s also issues that we’ll talk a little bit about in the communications protocols themselves and how am I actually getting that data.
I’ll also have the issue of vigilance. In the control room, vigilance is the term that goes to the controller monitoring the various screens that they’re looking at and having enough free time and the appropriate data to be able to discern if something doesn’t look right. That’s another use or purpose of the data, is to support vigilance. It could be called adequate information.
I’m also using this information to coordinate delivery, make sure that I’m going to hit my daily and monthly targets for delivery through all my metering points. Then of course, I’m going to use this for the back office measurement.
Now, within all of this, there’s also different needs for this data. Measurement obviously needs it, marketing and scheduling need to see measurement data, because they’re coordinating with customers to make sure that customers are staying on their targets and meeting their nominations.
There’s also analysis data or analytical data that might be used by engineering. There’s also optimization data that I might be bringing in off pumps and compressors and types of rotating equipment to optimize my use of fuel or power in order to move the product. Then, of course, there’s other types of coordination with other entities.
That kind of gives you a background. When you ask the question, how much data do I need, the first question you have to answer is: data for what purpose? Because that impacts how much data I require.
I also have some system considerations. These issues are kind of all interrelated.
One issue is the actual communications capability that exists within my system. If I’m moving data by satellite, satellite by its nature can have a natural delay between the time I ask for the data and the time I receive the data of several seconds or more. That impacts what I can do with the data. If I’ve got to be able to see things that change more quickly than once per second, maybe satellite’s not the way I want to go.
Likewise, there’s issues in the protocols. Most of the protocols that have been implemented historically are what would be called poll response, meaning, I ask for the data by sending a request through the telecommunications and then I get a response. I don’t get any data unless I ask for it from the host.
This can create some limitations, because depending on the amount of data points I’m using, the protocol, and the telecommunications capacity, I might only be able to get data at a maximum of some number.
In some systems, I’ve seen that take as long as 15 minutes, because there’s so much data being moved through the communications network, it takes me 15 minutes to make each request and wait for the response. That can drive considerations.
There are other systems that are, sometimes they might call this report by exception, but they basically send data to the host when it changes.
In the Internet of Things, there’s a protocol that’s been around for awhile, but is gaining popularity called MQTT, which is a publish/subscribe protocol. What that means is the data only goes to the host when it changes.
That can be a big deal, because I can move a lot more data and move it a lot quickly, and I don’t bog the system down with requests that are letting me know that things didn’t change.
Your whole communications capability, the polling, the protocol, the telemetry, all of that’s going to impact what you’re actually able to do.
The other thing that would be impacting what you’re able to do or should do is going to be the product type. By that, in general terms, I’m talking about my, am I talking about liquids or am I talking about gas.
In liquids, pressures tend to move through the system very quickly. There are tools we use for leak detection. Leaks can have significant environmental impact if they occur. Typically in liquid systems, I’m looking to get the data more frequently, something more like once every five seconds, maybe even more frequently if possible.
Then I have: what do I need for measurement? In measurement systems, particularly in state of the art measurement systems where I’m bringing a full audit trail from the meter in the field, I’m moving very large amounts of data. Typically, I only do that once an hour because most measurement in the field is accumulating an hourly number and then I’m bringing the data back at that being the resolution of the data. I really can’t poll any more frequently than once an hour, but when I do run that poll, I move a whole lot of data. That can materially impact the constraints of the system.
Then, lastly, there’s the pipeline operating condition. I’ve seen systems where they’re low pressure, they’re low volume, they’re small pipeline sizes, and data is only coming back to the host once every 5 to 15 minutes. I’ve seen other systems where they’re large liquid transportation systems and they’re moving data back to the host once every five minutes.
That’s kind of unpacking system considerations. I’ve got to understand what’s the purpose of gathering this data. Then I’ve got to understand my system constraints and what that means to me and what capabilities I can implement.
Then I have the data consideration itself. Now, I’ll start talking about how many data points do I need, what’s the quantity of data. You can think of that as, do I have 100 pressure points or 1,000 pressure points.
You can also think of that as, what kinds of data? Is it 100 pressure and 100 temperature and 100 flow? That doesn’t necessarily make a lot of difference in the communications infrastructure, but it certainly makes a difference in what am I doing with the data.
Then the other thing is speed, or you could think of this as latency, and that being the, when I ask for the data, how long do I have to wait to get it. That can have a material impact.
Likewise, I have frequency: how often do I ask for it? Some data — data that I’m using for leak detection — I need that data very frequently and increasing frequency improves the performance of the tools I use for leak detection.
Then, also got to talk about: is there data that I need to put together in sets in order to process it? This is actually quite material. Many of the older protocols — Modbus being a notable one — when I make a request, the time stamp is the time the data comes back to the host. It’s not necessarily the actual time.
Think about it this way. If I make a request at 15 seconds past the top of the hour, and if it takes five seconds for that request to get to the remote, and then the remote has to do some processing and that takes five seconds. Then it sends it back and that takes five seconds, then the data when I receive it versus the data and when it was created has a 15-second difference.
Now, in many cases, that doesn’t make a difference. But if I’m doing leak detection in a liquid system and I want to be using 5-second data, then I need all those time stamps to be correct to actual time. That can be a material consideration in terms of getting additional accuracy or additional performance out of your leak detection.
There’s also other issues where I may want to be looking at data like a data set that relates to measurement and I’ve got to make sure that those times are correct and accurate. There’s a lot of considerations around the data itself.
Let’s break this down a little bit and I’m going to talk about several different uses.
When I talk about measurement and I look at these four different considerations, the quantity is all the measurement points that I require for my balancing or flow monitoring. The speed, a lot of times, bringing back measurement data, particularly through legacy, narrowband communications can take several minutes per request.
The frequency — daily to maybe quarter hourly, depending on operations; most common would be hourly. Then, the set would be, when I’m doing line pack or balance rollup, I’ve got to get all of the meters within that group that allow me to do the line pack or the balance rollup.
That’s the way to look at measurement.
From a marketing perspective, the quantity of the data is all the measurement points where I have volumes that are nominated. Again, speed can take up to a few minutes to get updates. Frequency, typically, I need that hourly. Then the set is, I need a current day total, a flow rate, and a time left in the day so I can calculate a number to compare it what the nominated volume is.
When I’m looking at a measurement in marketing, the issues and the data are more about the sets and the sets being pulled totally. But it’s not generally a lot of data, it’s just getting the data together well.
When I start talking about analysis and optimization, I’m talking about lots and lots of data. The more data, the better. The more quantity that I have, if I’m getting data every five seconds, then every one second’s going to better, because that’s going to improve the fidelity of the algorithms that I’m using to optimize.
How quickly I get the data doesn’t really matter, because usually, I’m doing this analysis in the back office after the fact. In terms of frequency, you just call that often.
In terms of set data or processing, the key thing here is, if I’m looking at, if I’m doing compressor analysis, then I need all the data around the compressor. I need pressures by cylinder and RPM and firing timing by cylinder, and so forth. But, it’s getting all that data together in order to be able to do the analysis itself.
From a vigilance standpoint, the quantity is whatever’s necessary for the controllers to retain situation awareness. Then, when they have things that don’t look right, whatever quantity of data is necessary to support their analysis.
In this case, the speed issue, typically, that needs to be quick. If I’m looking at a critical process, if I’m adjusting a flow control at a critical point, I might want to be seeing that update every few seconds.
The frequency is generally never less than once every five minutes. There are exceptions to this, but in the control room, if you have a point that you need in the control room, typically, you’re going to need to see that data at least every five minutes, if not more frequently than that.
From a set processing standpoint, there’s really not any of that, it’s just, I need the data on my screens to be current.
Another purpose of data can just be simply coordination. Typically, if we’re coordinating, we’re coordinating around flows or pressures or valve positions and whatever contract requirements related to how we’re operating.
Speed, generally quick. If I’m asking for information that I need to communicate, it needs to respond quickly. Frequency here is generally on request. I only need the data when I ask for it. Then again, from a set processing consideration, there’s little if any considerations there.
Then, lastly, in terms of purpose, we’ll talk about alarming and emergency response. The quantity is generally only the data at the site. So, when I have an alarm, the way to think about this — versus I just have a high pressure — is that I want to know what’s going out on at a location. If I have a high pressure at a mainline valve, then I probably want to know everything that I can get, data-wise, around that mainline valve. Likewise, if I have an issue around a compressor station, around a pump, I want all the data.
I want the data quickly. I want the alarm to enunciate when the data is out of the operating range it should be within.
From a set processing standpoint, I want to be able to tie my alarm enunciation back to my alarm rationalization, meaning, my understanding of what would cause the alarm to generate and what things I might need to do in response. That’s really a topic for another episode.
Let’s talk about leak detection, because really, most of the questions that I got were, what do I need to be worried about with regards to leak detection?
From a regulatory requirement standpoint, API 1130 is incorporated in 49 CFR 195 by reference. In particular, what you need to consider is what’s in the appendix, and specifically, where they talk about sensitivity, reliability, accuracy, and robustness.
Sensitivity is the ability to predict a particular leak size and the speed at which you can do that prediction. Sensitivity is driven by the frequency of data — you’re going to have better sensitivity with 5-second data than 1-minute data, as an example.
The other thing that’ll drive sensitivity is the accuracy of the instruments. If you have instruments that are out of calibration or out of service, that’s going to impact sensitivity.
The other issue is reliability, and that is, alarms that actually indicate a leak and alarms that enunciate, but where there’s actually not a leak in place. Then you have accuracy, which is being able to predict a leak’s location, its flow rate, its volume loss, and the fluid type.
Then, robustness, and robustness is getting accurate prediction through a range of operating conditions. You can think of that as a range of pressures, a range of flows, a range of fluid characteristics like density, and having the system work so that the more data makes a leak detection system work better, provided it’s properly built and that the data is accurate.
Doing a good job of managing a leak detection system is very much about making sure that leak detection system’s being fed timely accurate information.
In API 1130, there’s a figure called C 1, that’s Charlie 1. That figure shows how a CPM system uses instrument data.
Think of it this way, I get instrument information from the field and I feed it to the engine that is performing the leak detection algorithm, doing the math and coming up with a result that tells me whether or not I have a leak. Then it’s taking that data and it’s comparing it to the instrument data, it’s taking the data you used to perform calculations and comparing the output of its algorithms to the data it used to perform the calculations. Then it’s doing additional math to determine, should I alert on what I’m seeing.
The instrument data is actually used twice. It’s used first in the engine that does the processing to create the outputs that tell me whether or not I might have a leak, and then again to compare what’s going on with the instruments with what’s going on with the inference engine, and particularly in real-time transient models, this is critical.
Think of it this way, I feed the data to a math model that tells me what should be happening a minute from now, and then a minute from now, I compare that to what I’m actually reading from the instruments. That means that the instrument data, how much I’m getting, its accuracy and how it’s organized is even more important, because it impacts two parts of the loop detection process.
Again, let’s come back to data and the question: how much data’s enough? I’m now going to talk a little bit about Section 5.1.1 in API 1130.
What you have to do is you have to match the instruments and their ranges and their specification to the pipeline and its operating design in terms of pressures, flows, temperatures, etc.
I need to be looking at the instrument manufacturer’s stated accuracy and linearity, and I need to combine all the different instruments and their stated accuracy and linearity, and use all the information to determine an uncertainty.
Uncertainty, you can think of it this way. I’ve got all these instruments, and none of these instruments measures a specific number. They measure a number within a range, within a tolerance.
Each instrument has its own tolerance. If I take three instruments and put them together and each of those instruments has an uncertainty of 0.25 percent, I can very quickly, by combining those instruments, get a system uncertainty up to 1 percent.
Without breaking that down, the thing that’s very important is the accuracy of the instruments, the number of bits that I’m bringing in — whether I’m bringing in 8 bits, 16 bits, 32 bits, 64 bits off the instrument (bits being the technical jargon, if you will, for an amount of data).
You can think about it, am I reading the number to four digits? Am I reading the number to eight digits? That’s a good analogy for what we’re talking about here.
It’s not only the accuracy of the instrument, but it’s also how much data in each packet for a specific number it can send. The more it can send, the better. All of that goes to overall uncertainty.
Really, the question, how much data is enough is, it’s the wrong question. The right question is, what kind of data do I need for what purpose? Then, how do I supply that kind of data? As with many other things, there’s a little bit of engineering and analysis required to get this correct.
I just covered in about 20 minutes what I spent about 35 minutes talking about at ENTELEC back in Fall 2015. I’ll actually load up a copy of that presentation so that you can download it off of the show notes page.
Given the timing of my getting this episode out, I’m actually a little behind the curve this week, it may not be up there the day the episode drops, but it should be up there within a day or two.
As I often do when I’m interviewing a guest, I want to try and boil this down to three key takeaways. I think the first takeaway is the question, how much data is enough, is a little bit the wrong question. Probably the better question is, what am I trying to do and how much data do I need to accomplish that function.
Where this question comes up the most is with leak detection, and with leak detection, it’s not, it’s certainly, more data is better, but it’s not just more data. You actually have to look at a program and you have to look at my instruments and how, what’s the accuracy of those instruments and how many bits of data can I get when I get the data back.
That may mean that I would better spend my money getting more detailed accurate information from the instruments versus getting more data. There’s a trade-off between those two things. That’s key takeaway number one.
I think the second key takeaway is, as with many other things, you’ve got to look at this as a system. You’ve got to look at the system constraints and make sure that whatever I’m doing to get the right amount of data, I’m addressing that comprehensively.
Then, lastly, when you really start trying to get the very high accuracy, very high sensitivity in leak detection, the amount of data and the accuracy of the instruments and the time stamping of all that data becomes very critical. That’s not a lot to say in a statement like that, but there’s a lot to unpacking that fully from a system standpoint.
I hope you enjoyed this week’s episode of the Pipeliners Podcast. It’s an honor and a privilege to actually be the guest if you want to look at it that way and have the opportunity to talk to you about how much data has enough.
Just a reminder before you go that you should enter to win our customized Pipeliners Podcast YETI tumbler. Simply visit pipelinepodcastnetwork.com/win to enter yourself in the drawing.
If you have ideas, questions, or topics that you’d be interested in, please let us know on the Contact Us page at pipelinepodcastnetwork.com or reach out to me on LinkedIn. My profile is Russel W. Treat.
Then lastly, I’d just like to mention that we have had several listeners ask the question, “How do you get into the pipeline business?”
I’d like to get on the show as a guest a recruiter or somebody internal to an HR department that’s involved in recruiting for pipelines to come on the show and help answer that question. If you’d be interested in doing that, just go to our Contact page and drop me a note.
Thanks again for listening, and I’ll talk to you next week.
[music]
Transcription by CastingWords