Friday 9 January 2015

Sequencing Depth and Coverage

"Sequencing depth and coverage: key considerations in genomic analyses"

The theoretical or expected coverage is the average number of times that each nucleotide is expected to be sequenced given a certain number of reads of a given length and the assumption that reads are randomly distributed across an idealized genome.

Actual empirical per-base coverage represents the exact number of times that a base in the reference is covered by a high-quality aligned read from a given sequencing experiment.

Redundancy of coverage is also called the depth or the depth of coverage.

Although the terms depth and coverage can be used interchangeably (as they are in this Review), coverage has also been used to denote the breadth of coverage of a target genome, which is defined as the percentage of target bases that are sequenced a given number of times. For example, a genome sequencing study may sequence a genome to 30× average depth and achieve a 95% breadth of coverage of the reference genome at a minimum depth of ten reads.

GC-rich regions, such as CpG islands, are particularly prone to low depth of coverage partly because these regions remain annealed during amplification. Consequently, it is important to assess the uniformity of coverage, and thus data quality, by calculating the variance in sequencing depth across the genome.

In a sequencing experiment only some of these fragments are sampled. The number of these distinct fragments sequenced is positively correlated with the depth of the true biological variation that has been sampled.

The first human genome that was sequenced using Illumina short-read technology showed that, although almost all homozygous SNVs are detected at a 15× average depth, an average depth of 33× is required to detect the same proportion of heterozygous SNVs.

Consequently, an average depth that exceeds 30× rapidly became the de facto standard13, 14. In 2011, one study15 suggested that an average mapped depth of 50× would be required to allow reliable calling of SNVs and small indels across 95% of the genome. However, improvements in sequencing chemistry reduced GC bias and thus yielded a more uniform coverage of the genome, which later reduced the required average mapped depth to 35×.

The power to detect variants is reduced by low base quality and by non-uniformity of coverage. Increasing sequencing depth can both improve these issues and reduce the false-discovery rate for variant calling. Although read quality is mostly governed by sequencing technology, the uniformity of depth of coverage can also be affected by sample preparation. A GC bias that is introduced during DNA amplification by PCR has been identified as a major source of variation in coverage. Elimination of PCR amplification results in improved coverage of high GC regions of the genome and in fewer duplicate reads16.






No comments:

Post a Comment