Tuesday 30 September 2014

GATK: DepthOfCoverage and DiagnoseTargets

DepthOfCoverage

DiagnoseTargets

Monday 22 September 2014

Reply to Reviewer Comments

Reply to Reviewer Comments (Example)

Papers to Read

Predicting the human epigenome from DNA motifs

SeqControl: process control for DNA sequencing

Accurate de novo and transmitted indel detection in exome-capture data using microassembly

Phen-Gen: combining phenotype and genotype to analyze rare disorders

Detecting and correcting systematic variation in large-scale RNA sequencing data

Normalization of RNA-seq data using factor analysis of control genes or samples

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

A community effort to assess and improve drug sensitivity prediction algorithms

A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data

A formal perturbation equation between genotype and phenotype determines the evolutionary action of protein coding variations on fitness

Decoding ChIP-seq with a double-binding signal refines binding peaks to single-nucleotides and predicts cooperative interaction

Genome-wide signals of positive selection in human evolution

Natural variation of histone modification and its impact on gene expression in the rat genome

Global translational reprogramming is a fundamental layer of immune regulation in plants

Architecture of the human interactome defines protein communities and disease networks

Haematopoietic stem and progenitor cells from humanpluripotent stem cells

Chromatin states define tumour-specific T cell dysfunction and reprogramming







Friday 19 September 2014

Interplay amaong different forces of evolution

Migration drift equilibrium

Mutation drift equilibrium: The equilibrium value of diversity is known as the population mutation parameter (theta).

Recombination drift equilibrium: As a consequence LD can reach an equilibrium value in finite populations. This equilibrium value of LD is determined by the population recombination parameter (rho). This parameter combines information on effective population size and recombination rate (c) using the equation: rho=4Ne*c. The precise relationship between measures of LD and rho is complex, but under certain conditions (r^2=1/1+rho). So that when rho is large, r^2=1/rho.

If we are examining LD over a large genomic region containing many polymorphic markers, it is unclear how best to combination the information from measures LD based on comparisons between individual pairs of markers (i.e., D, |D'| or r^2). Therefore recent attention has focused on estimating rho itself for these kinds of data, as this gives a single measure of LD for the entire region.

Mutation selection equilibrium:
Some deleterious mutational events have sufficiently high mutation rates that within a large population they occur several times within a single generation, and can be considered recurrent mutations.

The rate at which new mutations are generated can be balanced by the eventual elimination of each mutant by selection so that the average number of a given mutation reaches an equilibrium value within the population.

If we consider all deleterious alleles together, a balance between mutation and selection may operate over the genome as a whole, such that at equilibrium each genome contains a certain number of deleterious alleles.

A general rule for diploid loci is that for selection to be operating then the following relationship should hold:
s>1/2Ne
For haploid loci with one quarter the effective population size of diploid loci, the relevant rule is s>2/Ne.


Migration from the Evoluationary Perspective

Colonization is the process of movement into previously unoccupied land, thus entailing a founder effect. By contrast, migration is the movement from one occupied area to another. Gene flow is the outcome when a migrant contributes to the next generation in their new location.




Selection from the Evolutionary Perspective

Balancing selection: A selective regime that favours more than one allele and thus prevents the fixation of any allele.

Genetic Drift

Effective population size (Ne) measures the magitude of genetic drift: the smaller the effective population size, the greater the drift.

There are two ways of defining effective population sizes: one is based on the sampling variance of allele frequencies (i.e., how an allele's frequency might vary from one generation to the next), and the other utilizes the concept of inbreeding (i.e., the probability that the two alleles within an individual are identical by descent from a common ancestor).

Under most simple population size give identical values for Ne, in more complex situations this is not the case.

It is not easy to relate the effective population size (Ne) to the census size of a population (N), as there are many parameters that can affect his relationship, only some of which are relevant to humans.

With no favouring of either outcome, the fixation probability of an allele in the absence of selection is equal to its frequency of 1/2N.

The average time to fixation (t) in generations has been shown to be:
t=4Ne

The long-term effective population size has been shown to be approximately equal to the harmonic mean, rather than the arithmetic mean of the population sizes over time. The harmonic mean is the reciprocal of the mean of the reciprocals: (1/t) sum_{i=1}^{t}(1/Ni) for t generations.

Founder effects relate to the process of colonization and the genetic separation of a subset of the diversity present within the source population. In contrast, bottlenecks refer to the reduction in size of a single, previously large, population and a loss of prior diversity.

In general, the higher the reproductive variance, the lower the effective population size, because parental contributions become more and more unequal. It is worth nothing that when reproductive variance is less than expected under a Poisson distribution then Ne can be greater than N. 

Reproductive variance: The variance in number of offspring produced by a group of individuals.

Subpopulation: A randomly mating population that exchanges migrants with other populations to form a meta-population.

Meta-population: A group of populations connected by migration.

Fst: the fixation indices is a measure of the deviation of observed heterozygote frequencies from those expected under Hardy-Weinberg theorem. Fst compares the mean amount of genetic diversity found within subpopulations (Hs: the expected heterozygosity) to the genetic diversity of the meta-population (Ht).
Fst=(Ht-Hs) /Ht





 

Recombination: the evolutionary perspective

The benefits of recombination:

Recombination makes combinations of alleles across two or more loci that may be advantageous. Especially important with epistasis (interactions between loci) favouring a specific combination of alleles at the two loci.

Recombination helps get rid of bad mutations to create mutation-free offspring. 

========================================

Recombination is capable of breaking up advantageous allelic combinations. This results in the theoretical possibility that by increasing the likelihood of disrupting a beneficial haplotype, outbreeding can result in a drop in fitness known as outbreeding depression.

========================================

An allele that rises to high frequency through positive selection at a linked locus is said to be "hitchhiking". The reduction in diversity at loci linked to a recently fixed allele is dubbed as a selective sweep. Conversely, negative selection at a locus also reduces diversity at linked loci, albeit at a slow rate, by a processed known as background selection.

========================================

If we know the recombination rate per generation (r) between the newly mutated locus and a given locus, after a certain number of generations (t) we can track the decay of LD over time, by relating the present value of D (Dt) to the inital value of D (D0) using the equation:
Dt=(1-r)^t X D0


The Glossary of Evolution

Back mutation: A mutation from the derived state back to the ancestral state.

Recurrent mutation: A mutation that independently generates a derived state previously observed within the population. 

Thursday 18 September 2014

Publications of Selections on Non-coding Sequences

"The second hypothesis proposes that natural selection operates differently on mutations in cis-regulatory sequences6, 7, 8, 11. This hypothesis is based on two properties of the organization and function of cis-regulatory regions. First, allele-specific measures of transcript abundance indicate that each allele in a diploid organism is transcribed largely independently11, 12, 13, 14, suggesting that mutations in cis-regulatory regions are often co-dominant."

Natural selection operates far more efficiently on co-dominant mutations because they can have fitness consequences as heterozygotes: a new variant is visible to selection immediately rather than requiring drift to raise allele frequencies to the point at which homozygotes begin to appear in the population11.

Review: the evolutionary significance of cis-regulatory mutations

Purifying Selection in Deeply Conserved Human Enhancers Is More Consistent than in Coding Sequences

Genome-wide inference of natural selection on human transcription factor binding sites

The high degree of protein sequence similarity between phenotypically diverged species has led some to propose that regulatory evolution may be of considerably more importance than protein evolution4, 5.

Most of our direct knowledge regarding the evolution of regulatory elements comes from a handful of direct functional studies5, 6. A second, indirect approach is based on comparative genomics7. The rationale for this second approach is that if newly arising mutations are typically detrimental to gene function, functionally important parts of the genome are expected to evolve more slowly than those lacking function8, 9, 10, 11.

There are some limitations to the comparative genomics approach. First, a given genomic region might be conserved owing simply to a lower mutation rate12. Second, known regulatory elements do not seem to be particularly well conserved as a class, at least in Drosophila10. This finding suggests that taking an approach based on sequence conservation alone may lead to a biased view of regulatory evolution. Functionality of DNA sequences implies that they can be subject to both negative and positive selection. If a significant fraction of divergence between species observed in non-coding DNA is positively selected rather than selectively neutral or constrained, this could lead to underestimates of the functional importance of non-coding DNA and cause researchers to overlook the contribution of arguably the most interesting class of mutations in genome evolution—those reflecting adaptive differences between populations and species.

Adaptive evolution of non-coding DNA in Drosophila