Big Measurement Data: Efficient and Effective Processing Methods and Tools for Knowledge Extraction from High Reporting Rate Datasets
The Data Deluge, in which data is generated faster than it can be efficiently managed, analyzed and used to make informed decisions, represents a commonly used mantra of the ICT world. From an engineering and science perspective, Big Analog Data, has been previously coined by the National Instruments company as a suitable term to characterize high sample rate, digitized, measurements from sensors which can eventually produce high fidelity and (almost) infinitely complex digital twin representations of the physical world.
In practice, we consider that distributed measurement systems generate large quantities of online and streaming datasets that need to be processed in real-time for decision support and/or control purposes. In many situations the resulting data cannot be used directly by intelligent algorithms and suitable data preprocessing pipelines need to be defined and implemented. Furthermore, the heterogeneous reporting rates, embedded measurement models and spatial scales at which the measurements are collected need to be aligned in a robust manner for many tasks. In particular, for (Industrial) Internet of Things systems the dynamic trade-off between high spatial and time resolution measurements and data quality from large numbers of low-cost distributed sensors has to be accounted for, in conjunction with the application (metrological) requirements. These inherent compromises can be mitigated, albeit to a limited extent, through advanced data processing methods that lead to an improved reconstruction of the original signal.
The focus of the talk is thus how to best exploit increasingly available and quality data sources within a rigorous and robust instrumentation and measurement context while leveraging cross-domain interactions with the computing and control technical communities?
The talk will also introduce well-established programming and scientific computing libraries and frameworks that can be used to extract information and lead to accurate characterization of the underlying dynamic processes, with replicable and computationally efficient results. Relevant case studies will focus on smart meter data in real-world scenarios, where effective labelling and classification of microscale features can lead to improved energy management and an environmentally friendly and resilient electrical grid of the future. We focus on methods and techniques to first detect and label such features as anomalies in a data processing and learning pipeline. Subsequently, the labeled datasets are used in a forecasting framework as an early-warning system for potential imbalances in the local energy network. One key novelty is the combination of extracted features using time series data mining methods, such as the matrix profile, with state-of-the-art machine learning algorithms, including deep learning to optimize classification metrics in real time, across various model/algorithm structures and hyper-parametrization options.