ScholarGate
Assistant

Quality Control and Error Correction in Sequencing

Every sequencing run produces base calls of varying reliability, so quality control and error correction are the steps that quantify per-base accuracy, filter or trim low-quality data, and correct systematic artefacts before reads are assembled or used for variant calling. Without them, downstream genomic conclusions can be distorted by technical noise rather than biology.

Definition

Quality control in sequencing is the assessment and improvement of read reliability, using per-base quality scores, trimming and filtering, and error-correction methods to remove or correct technical artefacts so that assembly and variant calling reflect the underlying sequence rather than measurement error.

Scope

The entry covers per-base quality (Phred) scoring, the kinds of errors and biases that affect sequencing reads, read trimming and filtering, and the role of redundant coverage in distinguishing true signal from error. It is a methodological topic about data reliability and does not provide clinical or laboratory protocols.

Core questions

  • How is the reliability of an individual base call quantified?
  • What kinds of errors and biases affect sequencing reads?
  • How do trimming, filtering, and redundant coverage reduce the impact of errors?

Key concepts

  • Phred quality score
  • Base-calling accuracy
  • Read trimming and filtering
  • Sequencing error profiles
  • Coverage and consensus error reduction
  • Adapter and quality trimming
  • False-positive variant control

Mechanisms

Sequencing platforms assign each base call a Phred quality score, a logarithmic estimate of the probability that the call is wrong, which lets low-confidence bases be flagged. Quality-control tools then trim adapters and low-quality ends and filter unreliable reads before analysis. Because errors are partly random and partly systematic, sequencing each position many times allows a consensus to be taken so that isolated errors are outvoted, while characterising error profiles helps distinguish recurrent artefacts from genuine low-frequency variants. These steps reduce false positives in downstream variant calling and improve assembly accuracy.

Clinical relevance

Quality control and error correction determine whether genomic findings reflect true sequence or technical noise, which is critical wherever sequencing informs research or clinical interpretation. This entry is educational reference material on data reliability and does not constitute guidance for any specific test or clinical decision.

Evidence & guidelines

The methods are documented through primary tool and analysis papers rather than clinical guidelines: Ewing et al. (1998) established the Phred per-base quality score, Bolger et al. (2014) is a widely used read-trimming tool, and Ma et al. (2019) characterises error profiles in deep sequencing data; reviews such as Sims et al. (2014) connect coverage to error control.

History

Per-base quality scoring was formalised with the Phred program in 1998, giving sequencing data a standardised, interpretable measure of base-call confidence that became universal. As high-throughput platforms produced vast read volumes, dedicated trimming and filtering tools emerged in the 2010s, and detailed analyses of error profiles refined how genuine low-frequency variants are separated from systematic sequencing artefacts.

Key figures

  • Phil Green
  • Brent Ewing
  • Björn Usadel

Related topics

Seminal works

  • ewing-1998
  • bolger-2014
  • ma-2019

Frequently asked questions

What is a Phred quality score?
It is a logarithmic measure of the estimated probability that a base call is incorrect; for example, a Phred score of 30 corresponds to about a 1-in-1000 chance of error, so higher scores indicate more reliable base calls.
How does sequencing the same position many times reduce errors?
When a position is covered by many independent reads, random errors in individual reads can be outvoted by the majority, so taking a consensus across the reads yields a more accurate base call than any single read.

Methods for this concept

Related concepts