Dataspaces for Collaborative Research

Abstract

Data-driven collaborative research is a key driver of innovation. However, effective data exchange across institutions remains difficult in practice, often relying on ad hoc mechanisms that provide little support for discoverability, sovereignty, or the dynamic sharing of datasets during ongoing collaboration. Dataspaces have been proposed as a method to address these shortcomings, but their suitability for collaborative research remains largely unexplored in terms of the needs that academic collaborations impose, of practical deployment experiences, and of the potential for concrete use cases. Addressing this gap, we derive the requirements for infrastructures in data-driven collaborative research, providing the basis for assessing dataspaces. We further report on the deployment of a pilot dataspace for a real-world, large-scale research project, focusing on the onboarding of four institutes with diverse data types, interests, and disciplinary backgrounds. The deployment highlights the practical steps required to select and prepare a dataspace technology stack, establish connectors, and further assesses challenges posed by heterogeneous environments and the level of effort involved in integration. Beyond deployment, we explore the dual role of research dataspaces, serving as both a generic data sharing infrastructure and as a testbed for practical research on data sharing technologies. A federated process mining use case for data-driven production demonstrates the latter, where distributed process data are analyzed collaboratively. Our findings indicate that dataspaces are indeed a viable option for collaborative research if supported by adequate expertise. By deriving requirements, reporting deployment experiences, and demonstrating use cases, we contribute guidance for research practitioners. Next, future work should focus on sustainability and scalability needs, such as lowering entry barriers, developing trust mechanisms, and extending use case scenarios.

Publication
Proceedings of the 2025 IEEE International Conference on Big Data (BigData '25)
Event
2025 IEEE International Conference on Big Data, Dec 8 - Dec 11, 2025, Macau, China
Placeholder Avatar
Soo-Yon Kim
Liam Tirpitz
Liam Tirpitz
Placeholder Avatar
Max Wagels
Placeholder Avatar
Benedikt T. Arnold
Placeholder Avatar
Christian Rennert
Placeholder Avatar
István Koren
Placeholder Avatar
Jannik Rapp
Placeholder Avatar
Mario Moser
Placeholder Avatar
Wil Van Der Aalst
Placeholder Avatar
Bernhard Rumpe
Placeholder Avatar
Robert H. Schmitt
Dr. rer. nat. Jan Pennekamp
Dr. rer. nat. Jan Pennekamp
Postdoctoral Researcher
Placeholder Avatar
Sandra Geisler