For best experience please turn on javascript and use a modern browser!
You are using a browser that is no longer supported by Microsoft. Please upgrade your browser. The site may not present itself correctly if you continue browsing.
Why is the reuse of research data often more difficult than anticipated and how can different stakeholders respond? In a newly published paper in the Harvard Data Science Review (HDSR) Paul Groth, University of Amsterdam (UvA), director of the UvA Data Science Centre, and Christine L. Borgman, University of California, Los Angeles (UCLA), introduce a conceptual framework that helps explain the social and technical frictions that occur in the reuse of research data.

The authors developed a metaphor of distance between data creator and reuser along six dimensions: domain, methods, collaboration, curation, purposes, and time and temporality, which leads to recommendations for data creators, data reusers, archivists and funders. Their aim is to provoke new research questions, new research, and new investments in effective and efficient circulation of research data, and to identify criteria for investments at each stage of data and research life cycles.

The reusability of research data depends on bridging six types of “distance” between data creators and reusers, such as differences in disciplinary domain.

Abstract

Sharing research data is necessary, but not sufficient, for data reuse. Open science policies focus more heavily on data sharing than on reuse, yet both are complex, labor-intensive, expensive, and require infrastructure investments by multiple stakeholders. The value of data reuse lies in relationships between creators and reusers. By addressing knowledge exchange rather than mere transactions between stakeholders, investments in data management and knowledge infrastructures can be made more wisely. Drawing upon empirical studies of data sharing and reuse, we develop the metaphor of distance between data creator and data reuser, identifying six dimensions of distance that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality. We explore how social and socio-technical aspects of these dimensions may decrease – or increase – distances to be traversed between creators and reusers. Our theoretical framing of the distance between data creators and prospective reusers leads to recommendations to four categories of stakeholders on how to make data sharing and reuse more effective: data creators, data reusers, data archivists, and funding agencies. ‘It takes a village’ to share research data – and a village to reuse data. Our aim is to provoke new research questions, new research, and new investments in effective and efficient circulation of research data, and to identify criteria for investments at each stage of data and research life cycles.

Broad expert response

A series of expert commentaries were also published in HDSR responding to the article by scholars from disciplines as diverse as archives, astronomy, information science, data science, philosophy, physics, communication, classics, computer science, and social studies of science reflects the article’s timeliness and broad relevance.

The range of perspectives highlights the growing recognition that successful data reuse is inherently collaborative, and that reducing “distance” is a shared responsibility across disciplines and roles.

Copyright: UvA
This has been an incredibly rewarding intellectual project for me and I'm thankful to HDSR, our colleagues and Christine for making it possible. Paul Groth is scientific director of the UvA Data Science Centre (DSC) and group leader of the Intelligent Data Engineering Lab (INDElab) of the UvA Informatics Institute

More information

Harvard Data Science Review (HDSR) is an open access platform of the Harvard Data Science Initiative.

Paul Groth is scientific director of the UvA Data Science Centre (DSC) and group leader of the Intelligent Data Engineering Lab (INDElab) of the UvA Informatics Institute.

Prof. P.T. (Paul) Groth

Faculty of Science

Informatics Institute