Reproducibility and Open Science Practices

Data/code sharing, preregistration, registered reports

Open science practices make research verifiable and cumulative. Core practices include sharing data and analysis code, preregistering hypotheses and analysis plans before examining data, and submitting registered reports that undergo peer review before results exist. These measures directly address the replication crisis by curbing questionable research practices and enabling others to reproduce and build upon published findings.

Defining the Concept

Reproducibility refers to the ability of an independent researcher to obtain the same results using the same data and methods. Open science aims to make all components of the research process — data, code, materials, and pre-publication documents — transparently accessible. The two concepts are complementary: without openness, reproducibility cannot be verified; without reproducibility, openness has limited scientific value. The replication crisis, in which many published findings across disciplines failed independent verification, brought these issues to the forefront and motivated systematic reform of research infrastructure.

Core Practices: How They Work

Open science rests on three principal tools. First, data and code sharing: raw datasets and analysis scripts are deposited on platforms such as OSF, Zenodo, or GitHub with persistent identifiers. Second, preregistration: before examining any data, researchers log their research questions, hypotheses, and analysis steps in a time-stamped public record, thereby clearly separating confirmatory from exploratory analyses. Third, registered reports: a journal peer-reviews the introduction and methods before data collection begins and commits to publishing regardless of results, fundamentally reducing publication bias and outcome-switching.

Applied Example

Consider an education researcher examining the effect of an intervention program on student achievement. Following open science practices, the researcher creates a preregistration on OSF before any data collection: the primary hypothesis, the planned statistical test, the target sample size, and exclusion criteria are all documented. After data collection, the raw dataset and R or Python scripts are added to the same repository. An independent researcher can access this repository, rerun the analyses, and verify consistency of results. When full data sharing is infeasible due to privacy constraints, sharing the analysis code and a synthetic or aggregated dataset is recommended as a minimum standard.

Common Pitfalls and Good Practice Tips

A common misconception is that preregistration prevents exploratory analysis. In fact, preregistration only commits the researcher to the confirmatory analyses specified; exploratory findings can still be reported, provided they are clearly labeled as such. A second pitfall is deferring data sharing until after publication, a delay that often means sharing never occurs. Preparing data for sharing during the collection phase — including a codebook and README file — makes the process manageable. Finally, writing overly vague preregistrations undermines their value; hypotheses must be stated in measurable, testable terms to provide a meaningful constraint against analytic flexibility.

Key terms

Preregistration: Documenting hypotheses and analysis plans in a time-stamped record before examining any data.
Registered Report: A publication format where introduction and methods are peer-reviewed before data collection begins.
Replication Crisis: The widespread failure of published findings to hold up under independent replication attempts.
Open Data: Making raw research data publicly accessible via a persistent identifier or repository.
Publication Bias: The tendency for statistically significant results to be published more readily than null findings.