Using machine learning techniques to improve accurate flow meter predictions and enhance production optimization

Digitalization strategies are now commonplace throughout the process manufacturing and engineering sectors. A major driver for this has been the fact that end users now have a wealth of diagnostic data available to them from digital transmitters associated with a wide variety of devices installed throughout facilities. The data can be accessed in real time, for example through Open Platform Communications servers, or stored in a database for future analysis. The vast amounts of data now being collected requires intelligent software to deliver analytics solutions and allow end users to make better use of the data which they own.

The technical detail behind such strategies will vary between applications and organizations due to priorities and business needs, as well as differing interpretations of the word “digital.” However, for metrology purposes and in particular flow measurement, there are typically three key areas of interest to end users:

Data analytics
Condition-based monitoring (CBM)
Predictive analytics

In data analytics, both historical and real-time data can be analyzed to uncover complex patterns and trends in primary and secondary flow measurement instrumentation and related back to physical processes and events. Through data-driven modeling, a facility’s data can be used to replace inefficient “time-based” calibration and maintenance schedules with CBM systems. These can remotely determine facility process conditions, as well as detect fraudulent activity in custody transfer scenarios and meter calibration drift, without the need for unnecessary manual intervention, which is costly and time consuming for operators. In addition, specialized flow visualization devices, such as X-ray tomography-based systems, can output data to which models can be applied to provide end users with detailed insights as to the flow conditions within their multi-sensor system.

Predictive analytics embraces and expands upon the machine learning algorithms developed for CBM. When fully realized and developed through the use of high-resolution data sets, it allows end users to forecast meter calibration requirements, erosion and corrosion impact on flow meter functionality, and even flow pattern development.

However, any flow application wishing to embrace such concepts will be different with respect to physical layout, sensor availability and data resolution. This means that to develop CBMs and predictive models, which are useful and reflective of reality, it is vital that experienced flow measurement engineers consult on model development and commissioning. In doing so, one effectively programs into the model the collective experience of a facility’s operators and its individual components.

Data acquisition in multi-sensor systems has grown into a significant source for diverse research and industrial areas, where mainly non-invasive and non-destructive examinations are needed. An automated multi-sensor pipeline equipped with several sensors can output measurements simultaneously for industrial applications. These sensors record key process factors in the form of both structured and semi-structured time series data. The data-driven models fed by these complex temporal data can then unleash different levels of information and aid in production optimization.

In addition, a high resolution of faulty records allows the development of a reliable predictive model that can detect deviations from normal conditions and, if possible, identify the root cause of deviations. However, there might be practical limitations in generating real-world faulty events in flow measurement laboratories. Training data-driven models with insufficient data will lead to poor predictive performance, known as a data sparsity problem.

To overcome these limitations, synthetic data are widely employed to produce data that have not been observed in reality. Synthetic data preserve inter-relationships in data and reflect the statistical properties of real data to generate sufficient training data for data-driven algorithms.

Although many machine learning approaches have been used in multi-sensor systems, this field still faces many technical challenges. Some challenges originate from measurement operations in continuous and uncontrolled environments, while some are unique to different sensors. Challenges faced in data-driven modeling of flow measurement multi-sensor systems include:

Data quality — Poor quality data displayed in the shape of missing values, highly correlated time series, duplicate data, numerous non-informative parameters and outliers can cause significant challenges in the deployment of machine learning models in big data applications. Statistical methods including imputation, outlier detection, dimensionality reduction, signal processing and data transformations enhance data quality.
Lack of standardized benchmarks for model evaluation — This is crucial when building a foundation for unsupervised/supervised data-driven diagnostics.
Feature extraction — Multi-sensor systems regularly generate a large amount of heterogeneous data in structured or unstructured formats. These data can contain complex inter-sensor relationships, time-dependent patterns and/or spatial correlations. Given the complexities in such multivariate data structures, it is hard to distinguish deviations from these relationships. Different conditions may have similar characteristics, making it challenging to build unique connections between features and conditions.
Computational costs — Training complex artificial intelligence/deep-learning models for multi-dimensional data is resource-intensive and requires scale-up server configuration. This problem escalates in online learning systems, where models are updated continuously as more data streams become available.
Infrastructure — The data-driven approach from raw input data to predictive outputs requires a robust hardware infrastructure throughout the entire system’s pipeline to perform high-speed processing of large volumes of information. This is because intensive convolution operations and fully connected layers require efficient memory communication between both graphics processing units and central processing units. In addition, optimization strategies should be in place to distribute the workload of a program among different hardware resources.

Consequently, building such systems requires diverse skill sets, domain-specific knowledge and a powerful processing unit to produce reliable analytics. The multivariate time series models not only need to learn the temporal dependency in each variable but also require encoding the inter-correlations among different pairs of time series.

TÜV SÜD National Engineering Laboratory has therefore undertaken research to develop application and device-specific data-driven models which can provide accurate predictions as to the state of a given system. This addressed two significant problems in the real-time monitoring of the process industry — multivariate time-series feature encoding and data sparsity.

To illustrate these issues, two case studies are discussed. The first details multiphase flow regime prediction by using probabilistic modelling of X-ray tomography data. The second focuses on the development of a data-driven CBM system for a Coriolis-based metering system designed to detect undesirable operating conditions using real and synthetic data.

Case study 1: Flow regime prediction

In this study, TÜV SÜD National Engineering Laboratory extensively studied the use of 3D convolutional neural networks (Conv3Ds) along with multi-scale signature matrices where convolutions infer both temporal and spatial characteristics from data. The learned features were then used to classify fluid flow regimes.

The main objectives of this case study were to:

Demonstrate the potential of Conv3D against current state-of-the-art approaches.
Highlight the importance of signature matrices in multivariate time series prediction.
Compare the impact of feature encoding by temporal guidance, spatio-temporal guidance or no guidance.

In this experiment, TÜV SÜD National Engineering Laboratory examined four configurations in terms of temporal and spatial guidance:

No guidance: TÜV SÜD National Engineering Laboratory used an RF algorithm that ignored the temporal and spatial structures of multivariate time series, i.e., the expected model remained the same regardless of the order of features in temporal and spatial domains. The baseline RF model used four statistical properties including mean, standard deviation, skewness and kurtosis over the time domain for each time series.
Only temporal guidance: Conv1D and TCN models were used to apply convolution only in the temporal domain. The models require 3D input of size M*S*N where M is the number of samples in each batch, S is the length of sequences in each sample and N is the number of features.
Spatio-temporal guidance using input signals: The models had to provide guidance in both the temporal and spatial dimensions. TÜV SÜD National Engineering Laboratory used ConvLSTM and Conv3D models fed with the 5D input of size M*R*L*N*C where R is the number of subsequences, L is the length of each subsequence, and C is the number of channels (C=1 in this case).
Spatio-temporal guidance using signature matrices: The models used signature matrices to encode spatio-temporal correlation. TÜV SÜD National Engineering Laboratory named these models TCN_sign ConvLSTM_sign and Conv3D_sign as they use TCN, ConvLSTM and Conv3D encoders, respectively. The models were fed with the 5D input of size M*F*N*N*C where F is the number of frames, and C is the number of channels (C=3 in this case as signature matrices are calculated at three different scales).

In summary, TÜV SÜD National Engineering Laboratory findings in this case study were that in multivariate time-series modeling, multi-scale signature matrices provided a substantial amount of information for both temporal and spatial feature learning. 3D Convolutional neural networks outperformed current state-of-the-art approaches in multivariate time series classification. Through comparing the impact of feature learning by temporal guidance, spatio-temporal guidance, or no guidance, TÜV SÜD National Engineering Laboratory demonstrated the contribution of spatio-temporal characteristics in multi-sensor systems.

Case study 2: Data driven modelling in detecting an error in a Coriolis flow meter

This case study aimed to demonstrate the necessity of synthetic data generation when faulty occurrences are expensive or impossible to obtain in real life. TÜV SÜD National Engineering Laboratory attempted to anticipate real-world failures using synthetic faulty data to help data-driven models learn the system’s normal behavior and distinguish anomalous events.

In order to find the optimal model in detecting a known error in a Coriolis flow meter, six classification models — random forest (RF), stochastic gradient boosting (GBM), decision tree (Tree), naïve bayes classifier (NB), neural network (NN) and k-nearest neighbors (KNN) were trained and built using real and synthetically generated faulty data.

The data was divided into 70% training data, through which the models learned the patterns, trends and correlations within variables associated with “normal” and “faulty” conditions. Overall, the optimal models were RF, with high scores in all metrics, followed closely by GBM, while the worst model was NB.

The trained classification models were based on synthetically generated faulty conditions. To ensure that the models have the capability to be extended and applied in real-life situations to detect real faulty conditions, the models were then further tested on a set of data logged from a Coriolis meter where the meter was known to be exposed to the faulty condition for a period of time. Each of the models was then used to detect the error.

GBM performed the best in successfully detecting the error within a Coriolis meter and was therefore chosen to be the optimal model, performing well across all metrics in detecting synthetically generated faulty data and real faulty data. The results are promising, as despite the fact that the model was built using synthetically generated faulty data, it demonstrated the capability of extending its predictive power in detecting real faulty data which was output from a Coriolis flow meter during operation.

Conclusion

In summary, TÜV SÜD National Engineering Laboratory demonstrated the importance of spatio-temporal feature encoding in the predictive accuracy of multi-sensor systems. The encoded information was then related to historical system conditions to eventually predict the system's state in the future. The learned features could better represent complicated nonlinearity, inter-sensor relationships and system dynamics. Such prediction outcomes can benefit domain experts in different sectors to ensure optimal uptime on the equipment. This also includes real-time flow measurements verification where the CBM software processes data streams in real-time, detects any likely deviations from normal patterns and alerts the operators of a failure's cause.

Behzad Nobakht is a data scientist at TÜV SÜD National Engineering Laboratory. TÜV SÜD National Engineering Laboratory is a world-class provider of technical consultancy, research, testing and program management services. Part of the TÜV SÜD Group, the organization is also a global center of excellence for flow measurement and fluid flow systems and is the UK’s Designated Institute for Flow Measurement.