What question did this study set out to answer?

The study aims to develop a software architecture that integrates constantly changing sensor data for machine learning tasks.

March 25, 2026Open Access

An Event-Streaming Architecture for Machine Learning in Dynamic Sensor Landscapes

Key Points

The study aims to develop a software architecture that integrates constantly changing sensor data for machine learning tasks.
Utilized Apache Kafka for event-streaming management.
Implemented Flink for real-time data processing.
Used Trino for querying large datasets efficiently.
Performed comparisons of various machine learning techniques on generated data.
Successfully integrated new sensor data dynamically into machine learning workflows.
Improved machine learning accuracy through effective data handling and consolidation.
Showcased advantages of the architecture in regression tasks.

Abstract

Since the advent of Industry 4.0 and the resulting increased digitalisation of manufacturing, an ever-increasing amount of data has become available. Machine learning experts aim to exploit this data for different regression and classification tasks. One major challenge arises with the availability of newly gathered sensor data, be it over time or via the addition of completely new sensors and hence features. This paper aims to provide a software architecture utilising Apache Kafka, Flink, and Trino to prove how to automatically integrate and consolidate a dynamically growing set of samples and features to provide a basis for future machine learning tasks. This approach is validated by an artificial data generation setup that compares different machine learning techniques to solve a regression task and to highlight potential improvements.

An Event-Streaming Architecture for Machine Learning in Dynamic Sensor Landscapes

Key Points

Abstract

Cite This Study