This week’s Pipeliners Podcast episode features Jackie Smith of ROSEN discussing how to use GIS to integrate with other available data to support pipeline integrity as operators strive to meet regulatory requirements and safety objectives.
In this episode, you will learn how to support an Integrity Management System, how to create and analyze key data, the differences between structured data and unstructured data, the opportunity to use data more effectively to reduce risk and satisfy PHMSA regulations, and more important topics around pipeline data science.
GIS and Pipeline Data Science: Show Notes, Links, and Insider Terms
- Jackie Smith is the principal of Integrity Management Systems for ROSEN. Connect with Jackie on LinkedIn.
- ROSEN is the current episode sponsor of the Pipeliners Podcast. Learn more about ROSEN — the global leader in cutting-edge solutions across all areas of the integrity process chain.
- Integrity Management (Pipeline Integrity Management) is a systematic approach to operate and manage pipelines in a safe manner that complies with PHMSA regulations.
- Integrity Management Systems bring together a management system (MS) and an Integrity Management Program (IMP) to create a Pipeline Integrity Management System (PIMS) that strives for continuous improvement of pipeline integrity while reducing risk. PIMS utilizes the Plan-Do-Check-Act process to better manage and improve Integrity Management.
- ILI (Inline Inspection) is a method to assess the integrity and condition of a pipe by determining the existence of cracks, deformities, or other structural issues that could cause a leak.
- Magnetic flux leakage (MFL) is a magnetic method of nondestructive testing that is used to detect corrosion and pitting in pipelines.
- GIS (Geographic Information System) is a system designed to capture, store, manipulate, analyze, manage, and present spatial or geographic data.
- PODS (Pipeline Open Data Standard) supports the growing and changing needs of the pipeline industry through ongoing development, maintenance, and advancement of the Data Model and Standards. PODS also serves as a member association to maintain the PODS Data Model.
- Esri is an international supplier of geographic information system software, web GIS, and geodatabase management applications.
- UPDM (Utility and Pipeline Data Model) is a template data model that helps pipeline operators manage natural gas and hazardous liquid pipe system data within an Esri geodatabase. In addition to the data model representing a best practice on how to leverage the geodatabase, the data model also represents a repository of industry knowledge.
- SCADA (Supervisory Control and Data Acquisition) is a system of software and technology that allows pipeliners to control processes locally or at remote location. SCADA breaks down into two key functions: supervisory control and data acquisition. Included is managing the field, communication, and control room technology components that send and receive valuable data, allowing users to respond to the data.
- Structured data is highly-specific data that is stored in a predefined format. This type of data is typically stored in data warehouses.
- Unstructured data is a collection of various data types that are stored in their original format. This type of data is typically stored in data lakes.
- ROSEN offers an integrity data warehouse service that enables operators to use their data more effectively to address challenges such as corrosion, cracking, and other pipeline threats.
- Corrosion in pipeline inspection refers to a type of metal loss anomaly that could indicate the deterioration of a pipe. Inline inspection techniques are used to evaluate the severity of corrosion.
- Materials Property Verification (MPV) is the process of identifying the fundamental make-up — or DNA — of a piece of pipe based on existing information such as as-built drawings, pipe books, mill certificates, hydro test pressure records, etc.
- ROSEN offers a Material Property Verification solution that closes the gap on unknown material information and provides a detailed picture of the pipeline’s DNA.
- TVC (Traceable, Verifiable, and Complete) records support pipeline integrity management by verifying the condition of pipelines.
- HCA (High-Consequence Areas) are defined by PHMSA as a potential impact zone that contains 20 or more structures intended for human occupancy or an identified site. PHMSA identifies how pipeline operators must identify, prioritize, assess, evaluate, repair, and validate the integrity of gas transmission pipelines that could, in the event of a leak or failure, affect HCAs.
GIS and Pipeline Data Science: Full Episode Transcript
Russel Treat: Welcome to the Pipeliners Podcast, episode 198, sponsored by ROSEN, the global leader in cutting-edge solutions across all areas of the integrity process chain, providing operators the data they need to make the best Integrity Management decisions. Find out more about ROSEN at ROSEN-Group.com.
Announcer: The Pipeliners Podcast, where professionals, Bubba geeks, and industry insiders share their knowledge and experience about technology, projects, and pipeline operations. Now your host, Russel Treat.
Russel: Thanks for listening to the Pipeliners Podcast. I appreciate you taking the time. To show the appreciation, we give away a customized YETI tumbler to one listener every episode. This week, our winner is Steve Biagiotti with Dynamic Risk. Congratulations, Steve. Your YETI is on its way. To learn how you can win this signature prize, stick around till the end of the episode.
This week, Jackie Smith returns to the podcast in a new role as Principal for Integrity Management Systems at ROSEN. We’re going to visit about GIS and data management and integrity. Jackie, welcome back to the Pipeliners Podcast.
Jackie Smith: Thank you. Thanks. I’m glad to be here.
Russel: It’s been a while since we talked. You have had a career change. Why don’t you tell me and the listeners where you are now and what caused you to make the move?
Jackie: Sure. I can go into that. I came from Williams. I was with Williams for a while. I transitioned to ROSEN right after the remote work was set in, so right around March 2020. I had been with an operator for over 10 years, with Williams.
An opportunity came up to work with ROSEN Group and to lead a team and a business line area of integrity management systems, actually. I felt like it was a good time to make a move to broaden my career and to get out there and network more and meet more people and travel more and have all these great opportunities. That was March in 2020.
Russel: I’m sure you’ll have that opportunity with ROSEN for sure. You said, Integrity Management Systems. For those of us that are integrity management novices, like myself, what would Integrity Management Systems be?
Jackie: Integrity Management Systems allow you to augment your integrity planning, in short, a brief statement. I’m going to use GIS. I’m going to use Geographical Information Systems as well. I’m going to use solutions that maybe my vendor provides, maybe ROSEN would provide, to allow you to take your system of record, your business data, and integrate it with the ILI data, the anomalies, the inspection data.
Russel: It’s really the software data pulling it all together.
Jackie: Pull it all together, yeah, technically. Technically allows the engineer to make more decisions on the information in front of them, in an application. That’s really the short of it. Those engineering assessments that need to happen post-ILI, the software has a robust capability to do that and to be configured to pretty much any operator’s problem they may have or solution they need to adhere to regulations.
Russel: Interesting. Wow, that’s pretty cool. That’s perfect because I asked you to come on and talk about data management and GIS and Integrity Management. I’m going to ask the question that I know you love to get asked, which is “Why can’t I just use Google Earth?”
Jackie: Ugh. Then I just have to regain my breath.
Russel: You have to take a deep breath, smile a little bit, settle your spirit, and then answer the question. [laughs]
Jackie: Google Earth is a great commercial product. It’s great for a quick view of a location. I can pull in maybe my .kmz file, and see really quickly with this great satellite imagery they may have or some base imagery they may have gotten from another source. I can see things, visually, on the surface, but that’s all I can do.
If I want to do more, I need to have a GIS system. There’s many on the market. Since we mentioned Google, I can mention ESRI GIS. There’s a vendor called Esri. There’s a lot of other open-source and other vendors out there that augment that with the GIS type of data. Even databases will do GIS analysis.
What that GIS analysis gives you is just a full range of decision-making capabilities, analytics you can do. You integrate with your system of record. If I have my PODS data…When I was on last time, I talked about PODS, Pipeline Open Data Standards. I can use my PODS data and integrate that with my inspection data. I can see areas of interest such as populations. I can do buffers. I can do hotspot analysis…
Russel: Google Earth is a simple way to visualize the pipeline alignment, but it really doesn’t allow you to do anything beyond that. What you’re really talking about in GIS and systems is all the other things that you have to do.
Jackie: To manage your data, to make those decisions about everything that you need to do in your integrity planning. Another way to think of it is Google Earth, I can find the nearest gas station. With GIS, I can find out where to build or develop a gas station in my neighborhood or in my city.
Russel: I want to ask this question. We might get a little geeky here. This starts getting a little bit into my domain and certainly into my interests. I want to talk to you about structured versus unstructured data. First off, what is the difference between structured data and unstructured data?
Jackie: Sure, I can answer that. Unstructured data, let’s start with that first. That’s basically everything that you can think of, like Facebook or social media platforms, videos, video content, which just has exponentially grown over the past few years. Measurement, say from your SCADA system, is unstructured. It can become structured, but it’s unstructured natively, I do believe.
Russel: What would be the definition of unstructured data? For a novice — I’m not an IT guy — what’s the definition of unstructured data?
Jackie: Data that does not have a format for which to be easily consumed. That’s not necessarily always the case. All data can be consumed structured and unstructured. It’s not structured, Russel.
Russel: Maybe a better way to answer that is, what is structured data?
Jackie: Structured data can be put into a template or a format that can be easily more programmed against to read from, say, a database or to create models against for analytics or machine learning. Structured data can also be used there as well. Most of the data that we’re familiar with in a business professional environment is structured. Our HR systems are in a structured database.
Russel: The easiest thing to think about is an invoice. An invoice, if you look at it, it’s a form. It’s got columns and rows. That, in effect, is a data structure when you see it on a report. It’s always put together the same way.
If you look at a video, a video, in terms of its internals, is not structured, really. There might be metadata about the video that could be structured, but the video itself is not structured.
That conversation blows my mind up. When you look at what people like Google are doing with their search engines and the ability to very, very quickly find things…The way that we have managed data for the last 30 years with indexes and all that kind of stuff, you just don’t see that as often in Internet-style data. Have I got that right?
Jackie: Yes. A lot of the oil and gas pipeliners, we’ve been used to structured data for most of our career. We know what that is. Dealing with this unstructured data, or big data, some like to call it, we’re not sure, really, what to do with that.
There’s tools there and vendors that will help you with that. It opens up a new area for us to start really, really diving into. At ROSEN, we have dived into data science that can consume both structured and unstructured data, primarily structured, more recently.
Russel: This is a really interesting conversation. I’m trying to understand what unstructured data is. I will tell you that I think I conceptually get it, but I don’t think I have any construct in my mind about how to deal with it. It’s too abstract. I can’t get my brain around it.
Jackie: It’s out there. There’s different databases out there that will store this unstructured data. You’re right. It’s hard to grasp. [laughs]
Russel: It is very hard to grasp. It’s the difference between a whole bunch of pieces in a jigsaw puzzle in a box versus a whole bunch of pieces of a jigsaw puzzle all put together, that type of thing. Anyways, I could go on and talk about this for the entire time we have together, but we probably ought to move on to a new subject.
You mentioned all these different kinds of data. I’ve got map data. I’ve got population data. I’ve got alignment data. I’ve got tool run data and on and on and on. I’ve got all these different kinds of data. How do I go about pulling that all together in order to have a system?
Jackie: I’ll tell you what. What we like to do is just start modeling the data as far as having a home for it, so a storage mechanism in a database. You have to add some structure to it, some common structure. Then you can load it into a central repository or database. Some call that a data warehouse, not technically a data warehouse. Some call it a data lake.
You can take those data elements and structure them. Then you can start to make relationships and devise analytics and models on that data. For example, you may take ILI data from your Excel files. You need to take that and perform a translation on that, and put that into a structured database format.
The Excel is structured somewhat, but you want to be able to put a thousand different Excel files into one structure. You’ve got to take all those different structures and feed them into this database structure. Then, you can take soil data and put that into the database. Usually, it is provided in some kind of tabular format from the public sources. You can take your pipeline construction and design data, if you have it.
Most operators have it in some type of structure, whether it’s a database or its PODS or it’s a UPDM utility network data model. They can take that and feed that into that database system. This could be called a GIS. [laughs] It could also be called just a data warehouse where I have all my data in a centralized place, or I could have them in separate locations and have routines that integrate them, aggregate them together for a common operating picture.
Russel: I’m going to try and play this back a little bit. I may have it completely wrong. What I hear you saying is, I’ve got to have some level of structure for storing all this stuff.
Jackie: In order to do some of the things we’re doing at ROSEN, we’ve had to have a structured database and structured data models.
Russel: PODS would be a vehicle for pulling all this data together.
Jackie: Yes. You could use PODS. PODS has several areas of storage. I call it storage. It’s a data model, but a lot of people don’t know what that means. It’s a template for your data.
Russel: It’s a Dewey Decimal system for storing data, for the old folks.
Jackie: It’s a standard. [laughs]
Russel: [laughs] Young people don’t get this. Until well after I got out of college, if you wanted to do research, you had to go to the card catalog. You’d go in. There’d be all these cards. You could search them by subject matter or by author, primarily. You’d pull this out. There’d be this code. That code would tell you where to go in the library to find that book. A data structure is a similar kind of thing. It’s like you want to find this kind of data, here’s where you go to find that.
Jackie: The structure lets you store it in that way, so then you can retrieve it. You know where to look to find that data. Otherwise, it’s unstructured, and you don’t know where to look.
Russel: Other than PODS, what other standards are out there for structuring integrity data?
Jackie: Structuring integrity data in a database, the ones that I know of, in addition to PODS, is the Esri’s UPDM. It’s Utility Pipeline Data Model. As far as open-source or others, I don’t really know of any others that are “standards,” or not standards, for that matter. A lot of the software providers that provide GIS software and tools for the operators may have their own, as well.
Russel: They’ve built something that they started with. They’ve continued to add onto it and add onto it, and try to make any change to that data structure to conform into a model. It’s just unmanageable.
Jackie: To have a common model and common standard is really critical and imperative for us, going forward, as an industry so we use the same terms when we talk about things. If I talk about a pipe segment, it means different things to different people, if they’re using a different data model or if they work for an operator versus a service provider.
Just having that common vernacular is also important, as well as when you try to go and share data. If I want to sell my pipeline or if I want to buy this asset, if I know that their data is in this common format, it’s going to be more efficient and cheaper for me to integrate with my other data and my other systems.
Russel: The comments you’re making about the semantics or the vocabulary, if you will, that is so critically important. A big thing that we do when we go off to university is we learn a very broad and deep vocabulary and a way to talk about things that might be hard to otherwise conceptualize.
Then we go off to work. Now, we’ve got to get even more vertical and more specific. A lot of times we don’t have the vocabulary nailed down. I can’t tell you how many times I’ve been in a meeting where everybody’s using the same word, but they’re all meaning something different.
Jackie: That’s a good point. Then the data models and the databases can bring together all this data that we’ve been talking about since you’ve been interviewing ROSEN. You have Material Properties Verification. You have risk. You have these advanced capabilities or inspections, and geohazards.
All this data can be stored in one system instead of maybe separate assessments, PDFs located somewhere in vault share. You put it away and you don’t ever look at it again. To have that data in one system is really important.
Russel: You just drove right by that. We need to stop and talk about that a little bit, because the term I use, and I think you used this term as well, is the system of record. When I want to get the correct, accurate, unmodified version of something, where do I go?
That is a big deal in integrity management. We get these files. We get these data sets. Then analysts look at them, and they manipulate them to look for things, and to do analysis, and do their job, but that evaluated set is something different than the data set I started with. How do you address that?
Jackie: How do you trace that? How do you deal with that? You have to have some kind of governance or rules that you place on your employees, on your workflows, on your software even. Having a system of record that tracks that and has software or application logic that will have an auditability or traceability to data is important, tracking whether somebody made a change.
That governance would just basically be we’ve got this source of record. It’s a database. There’s tons of other things that you can add to that, to that governance. Also, this deals with the data literacy. Am I qualified to understand the data that I’m looking at, and am I qualified to change it? Those type of things have to be thought about…
Russel: I’ve never heard that term before, but I’m going to try and remember it, the idea of data literacy. That’s really material in this domain. There are so many different data types. I have to be literate in that particular data type to know what to do with it.
There’s a big difference between looking at an MFL tool run data set versus looking at geotechnical data for soil type versus looking at corrosion data. All of those require the ability to understand and interpret those different data sets.
Jackie: If I just start pulling them together — say I have them all in a database — and I start trying to do analytics on it and I don’t have any context for what I’m doing, I could maybe incorrectly make assumptions on the data. It’s really important to have some kind of background in the business itself as well.
Russel: That’s very important, particularly if you’re going to do anything meaningful with it in terms of value creation. It’s one thing to store it and have it. It’s another thing to use it for a purpose to create value.
Jackie: That’s at the core of what data science is. I don’t know if we’ve talked about that yet. Bringing value from all this data, put in simple terms, you could think of it as data science. There’s a lot more. It’s multi-disciplinary.
You have to add the different skill sets, such as statistics and programming skills and maybe data management skills with that particular business line or domain. For example, whether I’m dealing with health records or pipeline data, you have to have some kind of context for the data. I don’t want to call it a buzzword. Some people say, “Data science,” and they don’t know what it means. They think, “How can data have a science?”
Russel: I’ve done some podcasts on this subject. Being a guy who’s a bit of a math geek anyways, I find this pretty fascinating. If you boil data science down, it’s advanced statistics, except that data science is knowing all the things you need to do to be able to get to a data set where you can do advanced statistics. That is often a much bigger job than doing the advanced statistics.
Jackie: Getting the data, structuring it, understanding the structure — the literacy ties in there, data literacy, understanding it, and doing analysis on it to make sure that you have all the different data sets together in one place.
Russel: I want to ask a question to wrap up this part of our conversation. I need to build a little context for this. If you’re a student of the software industry, what you know is that there’ll be a problem that we’re trying to solve, and then there’ll be an industry grow up around that problem, and then it’ll mature. As that matures, it starts to surface another problem, and then we do the same thing over again.
Those cycles tend to run in 5 to 15-year cycles from the time you find the problem until it’s matured and you’re moving on to the next one. As it relates to getting a handle on integrity data from a total system standpoint, where do you think we are in that maturity as an industry?
Jackie: What is the rating scale? 0 to 15 years. [laughs]
Russel: A baby, an infant, an adolescent. Where are we in the maturity?
Jackie: Oh, gosh. I’m always going to say we’re just beginning because I feel like the sky’s the limit on data and understanding. Data as an asset is just reaching its awareness with oil and gas, at least midstream and downstream. The other sectors of our economy understand data as an asset and are very much reaping the benefit and profit such as social media.
Russel: The marketing guys and the financial guys, they are all over this.
Jackie: For oil and gas, I really think we’re toddlers. [laughs]
Russel: You’re right. The reason I ask that question, the reason it’s resourceful is that for us as an industry, we’ve spent a lot of time, energy, and effort really getting the ability to understand the metal, the tools, and the data capture, and the better electronics, and all of those kinds of things.
What we really haven’t done a lot of as an industry is look at these processes from a systems standpoint. There’s good, valid reasons. It’s a big, big undertaking, really big undertaking. These are very large data sets, very, very technical. The skill sets to evaluate these data sets exists in only a few people. It’s a big deal.
Jackie: I feel like we’ve gotten a good start with one of the products we have at ROSEN. It’s an integrity data warehouse. We’ve developed and processed about 10,000 pipeline records ranging from construction to attributes about those pipelines, geography, even rainfall, and soil, and those type of things along with inline inspection data.
Gathering those data sets has allowed us to do some really interesting trending. We’re able to do some benchmarking. We’re even able to predict — successfully predict — corrosion on uninspected pipelines as well. I think that that’s a big opportunity for the industry.
Russel: Wow, that is fascinating. I’m trying to think about “what’s the process?” I would think that most pipeline operators have something in place to capture their data and bring it together. I would also think that for a lot of those operators it’s incomplete or very specific to their needs.
A lot of internally developed software tends to be hard to change, modify, and improve. You can make small changes, but big changes get tough. If you were going to be advising the operators, what would you say, here’s the first two or three things you got to get really clear on?
Jackie: With the goal being systems, like common system?
Russel: Yeah. I’d say the goal being a comprehensive system for integrity management.
Jackie: It’s the understanding of the culture as well. Having the understanding that this data is important, what data is important, what problem am I really trying to solve, and then going backwards from there. If your goals are to adhere to regulation or to provide safety, and, of course, that is most of the operator’s goals, then you do need to have that system or record or that repository.
Many managers may hear GIS, not know what it is, and think that we can drop that from the budget or company. That’s not important. But, usually, that’s where the data is stored and tracked. The survey data — the alignment sheets — can be linked to those records, all the characteristics, the class location, and HCAs…
Russel: You just said a mouthful because from a regulatory perspective oftentimes it is I just need my records in a place I can find them when I get audited, right?
Jackie: Right. Like TVC.
Russel: From a safety standpoint, I need a way to evaluate those records, and perform analysis, and capture that analysis. From a performance standpoint, I’ve got to have workflow and the ability to review and improve my workflow. Each of those adds a whole nother order of magnitude of complexity into building a tool set like this.
Look, Jackie, it’s great to have you back. It always makes my head hurt a little bit to talk to you, but that’s cool. I like that.
Jackie: That is a compliment maybe, maybe not. I’ll have to look into that. [laughs]
Russel: No, it’s in a good way. I would really get a kick out of having the opportunity to take a deep dive into some of the stuff you’re doing because it’s — while I’m not an integrity guy, I am a software guy, and I am a data guy — I haven’t really got my teeth into something like that in a while. It’s like I feel a little rusty. I feel like I need to go have a look.
Jackie: Right. Any time, my calendar’s open.
Russel: All right. I’ll look forward. I’ll take you up on it. Thanks for coming back, and we look forward to having you again.
Jackie: Thank you.
Russel: I hope you enjoyed this week’s episode of the Pipeliners Podcast and our conversation with Jackie. Just a reminder before you go, you should register to win our customized Pipeliners Podcast YETI tumbler. Simply visit pipelinepodcastnetwork.com/win to enter yourself in the drawing.
Russel: If you have ideas, questions, or topics you’d be interested in, please let me know on the Contact Us page at pipelinepodcastnetwork.com or reach out to me on LinkedIn. Thanks for listening. I’ll talk to you next week.
Transcription by CastingWords