FIWARE R & D Projects: Cyclops

Automated end-to-end data life cycle management for FAIR data integration, processing and re-use.

Description

The ability to integrate data from multiple sources is nowadays a major competitive advantage for organizations. Data-driven applications using AI techniques are reshaping various industries such as manufacturing, tourism, and mobility. The European Strategy for Data aims to create a single market for data while ensuring Europe's global competitiveness and data sovereignty. This has led to the development of Common European data spaces, yet the governance of the data life cycle in organizations has not kept up with the rapid technology evolution and remains largely manual. This Is especially evident in scenarios where tens or hundreds of continuously evolving data sources produce semi-structured data, and create significant challenges when governed manually, causing organizations to end up with data silos. A systematic and standardized mechanism is needed to ingest, integrate, and process data, thus boosting the ability to develop new data- centric business models. However, current research and development efforts typically target one aspect of the end-to-end data lifecycle, such as scalable data management, ML performance, AI explainability, or sharing, while dismissing its governance.

Objective

To overcome this limitation, CyclOps proposes a new framework for the governance and maintenance of the complete data lifecycle for large-scale volumes of data generated in heterogeneous distributed sources to enable data sharing and exchange. CyclOps intelligently automates, by means of knowledge graphs (KGs) and with a human-in-the-loop approach, the generation and execution of data processing pipelines. KGs are the established formal models to represent data and metadata while providing context and guaranteeing interoperability with other systems adhering to the FAIR Guiding Principles. CyclOps will enable organizations to seamlessly provide, cross and analyze machine- and human-generated data from and for data spaces, thus facilitating the provision of added-value services on top.