Data Management and FAIR Principles
Making data findable and reusable
Responsible data management encompasses planning through a data management plan, documentation, secure storage, long-term retention, and sharing. The FAIR principles assert that data should be Findable, Accessible, Interoperable, and Reusable — achieved through persistent identifiers, rich metadata, standard formats, and clear licences. Together, these practices enhance scientific transparency, facilitate reproducibility and secondary use of research data, and ensure that privacy and ethical obligations are respected throughout the research lifecycle.
Defining the Concept
Data management refers to the systematic handling of data throughout its lifecycle — from planning and collection through processing, storage, archiving, and sharing. The FAIR principles, conceptualised by Wilkinson and colleagues in 2016, assert that data should be findable, accessible, interoperable, and reusable not only by humans but also by computational systems. These principles form a cornerstone of the open science movement and are increasingly required by funders and academic journals as a condition of publication or grant award.
How the FAIR Principles Work
Findability requires assigning a persistent identifier (such as a DOI) and publishing the data with rich metadata in a searchable repository. Accessibility means the data can be retrieved via standard protocols — with authentication where necessary — while metadata remains openly available even for restricted datasets. Interoperability demands the use of shared data formats and controlled vocabularies (e.g., Dublin Core, Schema.org). Reusability is achieved through open or machine-readable licences (such as CC BY), thorough data documentation, and adherence to community-agreed standards.
A Concrete Application Example
A social scientist uploading survey data to an open repository such as Zenodo might follow these steps: (1) the dataset receives a unique DOI; (2) a detailed README file is included with the data collection protocol, variable definitions, and scales used; (3) data are saved in CSV format for broad software compatibility; (4) the dataset is released under a CC BY 4.0 licence; (5) participant identifiers are anonymised to protect privacy. This workflow ensures that other researchers can find, download, and legitimately reuse the data in their own studies, thereby extending the value of the original research.
Common Pitfalls and Good Practice Recommendations
Common pitfalls include leaving the data management plan until the end of the project, providing sparse metadata, using non-persistent links (broken URLs), and omitting licence information. Good practice requires writing the data management plan at the outset, selecting a repository recommended by the institution or funder, and keeping documentation up to date throughout the project. For restricted data, the "closed data, open metadata" approach is advisable: even when the data itself cannot be shared, its metadata is published so that other researchers know the data exist and how access may be requested.
Key terms
- Data Management Plan
- A formal document describing how data will be collected, stored, and shared during and after a project.
- Persistent Identifier
- A stable digital reference, such as a DOI, providing long-term unique access to a dataset.
- Metadata
- Structured information describing the content, context, and structure of a dataset.
- Interoperability
- The ability of different systems to exchange and use data seamlessly through shared standards.
- Open Licence
- A legal framework permitting others to freely use data under specified conditions.
Further reading
- Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. DOI: 10.1038/sdata.2016.18 ↗