This use case focuses on developing and demonstrating an automated workflow that reduces the data management burden for Earth observation (EO) scientists by enabling on-demand access to pre-configured, cloud-optimised data cubes tailored to specific user access patterns. By automating common remote sensing operations such as NDVI trend analysis and SAR coherence using tools like Zarr, Dask, and xarray, the workflow allows users to efficiently perform time-series analyses without handling individual files, thus enabling them to focus on scientific interpretation rather than data preparation. The solution supports both recent EO datasets (e.g., Sentinel-2 L2A from 2024) and older, non-optimised datasets from previous years by providing reusable workflows that convert legacy data into the same cloud-native, time-series-ready format through automated preprocessing, chunking, and metadata harmonisation. Both the data hosted in the EODC data repository and the resulting data cubes will be made discoverable via the EOSC Matchmaker, and EODC’s processing services will be offered in the Catalogue of tools, allowing users to process data directly on the EODC cloud infrastructure
EODC Data Repository

About

Use Case Status Before Joining EOSC Data Commons
Running scientific analyses based on Earth Observation (EO) data at scale involves a significant data management burden for researchers. Large volumes of data from multiple sources must be located, downloaded, and prepared before analysis can begin. This preparation involves numerous pre-processing steps, including data format conversions, reprojection, atmospheric correction, cloud masking, and structuring data into consistent spatiotemporal formats to facilitate multi-sensor fusion and time-series analysis. Metadata is often incomplete or non-harmonised, limiting discoverability and interoperability. While the EODC repository already provides petabyte-scale storage alongside cloud computing and standardised access to EO datasets via a STAC26 interface, users must still perform several resource-intensive steps manually. As EO data volumes continue to grow, this workflow becomes increasingly inefficient, diverting researchers’ time and resources away from scientific interpretation and towards repetitive technical tasks.
Objectives
- Develop automated workflows that generate on-demand, cloud-optimised EO data cubes tailored to user-defined spatial and temporal access patterns.
- Enable scalable and efficient time-series analysis of EO data using modern open-source tools such as xarray, Dask, and Zarr, integrated into user-preferred environments.
- Provide reusable workflows to convert legacy, non-optimised EO datasets into harmonised, cloud-native formats suitable for integration into ongoing analyses.
- Ensure the discoverability of both EO datasets and resulting data cubes through the EOSC Matchmaker and facilitate their processing on the EODC cloud infrastructure.
Integration with EOSC Data Commons Services and Components
EOSC Matchmaker
The use case focuses on making the EO datasets and derived data cubes in the EODC repository discoverable and accessible through the AI-based search engine and metadata warehouse by developing a dedicated STAC metadata crawler. In addition, the use case will contribute to the Catalogue of Tools by publishing the developed workflows for pre-processing and cube generation as reusable packages. These will be exposed via the Catalogue of Tools so that researchers can pair data discovered through the EOSC Matchmaker with ready-to-use processing tools.
EOSC Data Player
The use case will integrate with the EOSC Data Player by providing cloud-based computing services through the EODC infrastructure. Once data and workflows are discoverable via the EOSC Matchmaker, the EOSC Data Player will enable their execution directly on EODC’s compute resources, avoiding the need for users to transfer large volumes of Earth observation data.
FAIR Assessment Toolkit
The use case will integrate with the FAIR Assessment Toolkit to evaluate and improve the FAIRness of both existing datasets in the EODC repository and the newly generated data cubes. FAIR metrics will be applied to assess discoverability, accessibility, interoperability, and reusability. The insights gained will guide improvements in metadata quality, harmonisation, and publication practices, directly contributing to the creation of metadata-rich, standardised, and reproducible data products.
Package for processing datasets
The use case will contribute by generating packages that bundle common EO analyses (e.g. NDVI trends, SAR coherence) with the necessary protocols, URIs, permissions, and tokens to access the required datasets as well as the matching tools. These self-contained packages will ensure that both data access and execution code are included, allowing users to reproduce analyses directly on EODC infrastructure.
