15 May 2025
The authors developed a metaphor of distance between data creator and reuser along six dimensions: domain, methods, collaboration, curation, purposes, and time and temporality, which leads to recommendations for data creators, data reusers, archivists and funders. Their aim is to provoke new research questions, new research, and new investments in effective and efficient circulation of research data, and to identify criteria for investments at each stage of data and research life cycles.
Sharing research data is necessary, but not sufficient, for data reuse. Open science policies focus more heavily on data sharing than on reuse, yet both are complex, labor-intensive, expensive, and require infrastructure investments by multiple stakeholders. The value of data reuse lies in relationships between creators and reusers. By addressing knowledge exchange rather than mere transactions between stakeholders, investments in data management and knowledge infrastructures can be made more wisely. Drawing upon empirical studies of data sharing and reuse, we develop the metaphor of distance between data creator and data reuser, identifying six dimensions of distance that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality. We explore how social and socio-technical aspects of these dimensions may decrease – or increase – distances to be traversed between creators and reusers. Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies. ‘It takes a village’ to share research data – and a village to reuse data. Our aim is to provoke new research questions, new research, and new investments in effective and efficient circulation of research data, and to identify criteria for investments at each stage of data and research life cycles.
A series of expert commentaries were also published in HDSR responding to the article by scholars from disciplines as diverse as archives, astronomy, information science, data science, philosophy, physics, communication, classics, computer science, and social studies of science reflects the article’s timeliness and broad relevance.
The range of perspectives highlights the growing recognition that successful data reuse is inherently collaborative, and that reducing “distance” is a shared responsibility across disciplines and roles.
This has been an incredibly rewarding intellectual project for me and I'm thankful to HDSR, our colleagues and Christine for making it possible.Paul Groth is scientific director of the UvA Data Science Centre (DSC) and group leader of the Intelligent Data Engineering Lab (INDElab) of the UvA Informatics Institute
Harvard Data Science Review (HDSR) is an open access platform of the Harvard Data Science Initiative.
Paul Groth is scientific director of the UvA Data Science Centre (DSC) and group leader of the Intelligent Data Engineering Lab (INDElab) of the UvA Informatics Institute.