Image
Type: Data Repository + Virtual Research Environment (VRE)

DaSCH Service Platform

Image

About

The DaSCH Service Platform (DSP) is an open-source, standards-based infrastructure developed by the Swiss National Data and Service Centre for the Humanities (DaSCH). It is designed for the long-term preservation, management, and reuse of complex humanities research data, particularly those with intricate internal structures.  DSP-APP is the web-based user interface of the VRE, which enables researchers to create and manage data models, to search, browse and edit the data, and to annotate and link resources. DSP-API is a RESTful API that manages data storage and retrieval. Non-binary data is stored using the Resource Description Framework (RDF) in a dedicated triplestore, facilitating complex data structures and relationships. Binary media files (e.g., images, audio, video, documents) are converted to specialised archival formats and stored using the IIIF-compatible media server SIPI, with metadata maintained in the triplestore.

Use Case Status Before Joining EOSC Data Commons

Data quality poses a significant hurdle, as humanities research data arrives in highly heterogeneous formats with substantial quality issues, demanding extensive manual intervention. Each project typically necessitates 2-6 weeks of dedicated data specialist time for cleansing and transformation. Furthermore, there is a distinct lack of integrated tooling to assess data FAIRness and no OAI-PMH API for metadata harvesting. The reliance on manual processes for data cleansing is heavy, particularly for digital scholarly editions delivered as Word documents, unstructured bibliographies, inconsistent naming conventions, and free text fields requiring controlled vocabulary. Finally, the Virtual Research Environment (VRE) and repository functionalities are merged into a single system, which limits specialised optimisation.

Objective in the Project

  • Implement AI-powered tools for controlled vocabulary creation from free text fields, disambiguation and Named Entity Recognition capabilities, consistency checking mechanisms, fuzzy matching for linking tables and fixing variations, and pattern recognition in semi-structured strings like bibliographies and separated files.
  • Implement the FAIR Assessment Toolkit and ensure continuous FAIRness evaluation and improvement.
  • Extend data and metadata APIs coverage by implementing the OAI-PMH API for metadata harvesting and enhancing data access APIs for broader interoperability.
  • Full integration with EOSC Matchmaker for data discovery, integration with EOSC Data Player for processing workloads, and support for Packages for Processing Datasets.

Integration with EOSC Data Commons Services and Components

Expected Results
  • The EOSC Matchmaker aims to make DaSCH repository data findable by exposing metadata through standardized APIs for harvesting, thereby enhancing the discoverability of humanities research data across EOSC.
  • The EOSC Data Player will pair DaSCH data with tools from EOSC Matchmaker and execute Packages for Processing Datasets. This integration involves implementing the capability to send workloads for remote processing, enabling advanced analytics on DaSCH data without local processing constraints.
  • Finally, the FAIR Assessment Toolkit is designed to evaluate and improve the FAIRness of hosted data by embedding assessment tools within the data lifecycle, leading to continuous FAIRness monitoring and improvement recommendations.

Discover EOSC Data Commons Use Cases

Loading...