"Sequencing depth and coverage: key considerations in genomic analyses"
The theoretical or expected coverage is the average number of times that each nucleotide is expected to be sequenced given a certain number of reads of a given length and the assumption that reads are randomly distributed across an idealized genome.
Actual empirical per-base coverage represents the exact number of times
that a base in the reference is covered by a high-quality aligned read
from a given sequencing experiment.
Redundancy of coverage is also called the depth or the depth of coverage.
Although the terms depth and coverage can be used interchangeably (as they are in this Review), coverage has also been used to denote the breadth of coverage of a target genome, which is defined as the percentage of target bases that are sequenced a given number of times. For example, a genome sequencing study may sequence a genome to 30× average depth and achieve a 95% breadth of coverage of the reference genome at a minimum depth of ten reads.
GC-rich regions, such as CpG islands, are particularly prone to low
depth of coverage partly because these regions remain annealed during
amplification. Consequently, it is important to assess the uniformity of coverage, and
thus data quality, by calculating the variance in sequencing depth
across the genome.
In a sequencing experiment only some of these fragments are sampled. The
number of these distinct fragments sequenced is positively correlated
with the depth of the true biological variation that has been sampled.
The first human genome that was sequenced using Illumina short-read technology showed that, although almost all homozygous SNVs are detected at a 15× average depth, an average depth of 33× is required to detect the same proportion of heterozygous SNVs.
Consequently, an average depth that exceeds 30× rapidly became the de facto standard13, 14. In 2011, one study15
suggested that an average mapped depth of 50× would be required to
allow reliable calling of SNVs and small indels across 95% of the
genome. However, improvements in sequencing chemistry reduced GC bias and thus yielded a more uniform coverage of the genome, which later reduced the required average mapped depth to 35×.
The power to detect variants is reduced by low base quality and by
non-uniformity of coverage. Increasing sequencing depth can both improve
these issues and reduce the false-discovery rate for variant calling.
Although read quality is mostly governed by sequencing technology, the
uniformity of depth of coverage can also be affected by sample
preparation. A GC bias that is introduced during DNA amplification by
PCR has been identified as a major source of variation in coverage.
Elimination of PCR amplification results in improved coverage of high GC
regions of the genome and in fewer duplicate reads16.
No comments:
Post a Comment