The DaSCH Service Platform (DSP) is an open-source, standards-based infrastructure developed by the Swiss National Data and Service Centre for the Humanities (DaSCH). It is designed for the long-term preservation, management, and reuse of complex humanities research data, particularly those with intricate internal structures. DSP-APP is the web-based user interface of the VRE, which enables researchers to create and manage data models, to search, browse and edit the data, and to annotate and link resources. DSP-API is a RESTful API that manages data storage and retrieval. Non-binary data is stored using the Resource Description Framework (RDF) in a dedicated triplestore, facilitating complex data structures and relationships. Binary media files (e.g., images, audio, video, documents) are converted to specialised archival formats and stored using the IIIF-compatible media server SIPI, with metadata maintained in the triplestore.
Data quality poses a significant hurdle, as humanities research data arrives in highly heterogeneous formats with substantial quality issues, demanding extensive manual intervention. Each project typically necessitates 2-6 weeks of dedicated data specialist time for cleansing and transformation. Furthermore, there is a distinct lack of integrated tooling to assess data FAIRness and no OAI-PMH API for metadata harvesting. The reliance on manual processes for data cleansing is heavy, particularly for digital scholarly editions delivered as Word documents, unstructured bibliographies, inconsistent naming conventions, and free text fields requiring controlled vocabulary. Finally, the Virtual Research Environment (VRE) and repository functionalities are merged into a single system, which limits specialised optimisation.
The initial phase will delve into a comprehensive analysis of metadata standards pertinent to humanities data. This includes an evaluation of OAI-PMH implementation strategies and metadata crosswalks, an assessment of existing FAIR evaluation frameworks tailored for humanities repositories, and an exploration of AI-based tools for data cleansing within cultural heritage contexts.
This release will introduce a foundational OAI-PMH endpoint implementation and initiate EOSC Matchmaker metadata exposure. Additionally, it will involve the application of the FAIR Assessment Toolkit to 3-5 sample projects and the development of a prototype AI-based data cleansing tool for a specific use case, such as bibliography parsing.
This will mark the deployment of a production-ready OAI-PMH API, capable of supporting multiple metadata formats. It will also see the full integration of EOSC Matchmaker and the FAIR Assessment Toolkit into the data curation workfl ow. Furthermore, AI-powered cleansing tools will be available to address 2-3 common data quality issues. Initial integration of the EOSC Data Player for basic processing tasks is planned, alongside a preliminary separation of VRE/Repository if deemed feasible.