Thursday, 30 June 2016

HI-C Terminology

Cited from Paper "A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping"

'We define the ‘‘matrix resolution’’ of a Hi-C map as the locus size used to construct a particular contact matrix and the ‘‘map resolution’’ as the smallest locus size such that 80% of loci have at least 1,000 contacts. The map resolution is meant to reflect the finest scale at which one can reliably discern local features.'

'We began by probing the 3D partitioning of the genome. In our earlier experiments at 1 Mb map resolution (Lieberman-Aiden et al., 2009), we saw large squares of enhanced contact frequency tiling the diagonal of the contact matrices. These squares partitioned the genome into 5–20 Mb intervals, which we call ‘‘megadomains.’’We also found that individual 1 Mb loci could be assigned to one of two long-range contact patterns, which we called compartments A and B, with loci in the same compartment showing more frequent interaction. Megadomains—and the associated squares along the diagonal—arise when all of the 1 Mb loci in an interval exhibit the same genome-wide contact pattern. Compartment A is highly enriched for open chromatin; compartment B is enriched for closed chromatin.'

'Two of the five interaction patterns are correlated with loci in compartment A (Figure S4E). We label the loci exhibiting these patterns as belonging to subcompartments A1 and A2. Both A1 and A2 are gene dense, have highly expressed genes, harbor activating chromatin marks such as H3K36me3, H3K79me2, H3K27ac, and H3K4me1 and are depleted at the nuclear lamina and at nucleolus-associated domains (NADs) (Figures 2D, 2E, and S4I; Table S3). While both A1 and A2 exhibit early replication times, A1 finishes replicating at the beginning of S phase, whereas A2 continues replicating into the middle of S phase. A2 is more strongly associated with the presence of H3K9me3 than A1, has lower GC content, and contains longer genes (2.4-fold).'

'The other three interaction patterns (labeled B1, B2, and B3) are correlated with loci in compartment B (Figure S4E) and show very different properties. Subcompartment B1 correlates positively with H3K27me3 and negatively with H3K36me3, suggestive of facultative heterochromatin (Figures 2D and 2E). Replication of this subcompartment peaks during the middle of S phase. Subcompartments B2 and B3 tend to lack all of the above-noted marks and do not replicate until the end of S phase (see Figure 2D). Subcompartment B2 includes 62% of pericentromeric heterochromatin (3.8-fold enrichment) and is enriched at the nuclear lamina (1.8-fold) and at NADs (4.6-fold). Subcompartment B3 is enriched at the nuclear lamina (1.6-fold), but strongly depleted at NADs (76-fold).'

'Upon closer visual examination, we noticed the presence of a sixth pattern on chromosome 19 (Figure 2F). Our genome-wide clustering algorithm missed this pattern because it spans only 11 Mb, or 0.3% of the genome. When we repeated the algorithm on chromosome 19 alone, the additional pattern was detected. Because this sixth pattern correlates with the Compartment B pattern, we labeled it B4. Subcompartment B4 comprises a handful of regions, each of which contains many KRAB-ZNF superfamily genes. (B4 contains 130 of the 278 KRAB-ZNF genes in the genome, a 65-fold enrichment). As noted in previous studies (Vogel et al., 2006; Hahn et al., 2011), these regions exhibit a highly distinctive chromatin pattern, with strong enrichment for both activating chromatin marks, such as H3K36me3, and heterochromatin-associated marks, such as H3K9me3 and H4K20me3.'

Definition of In situ Hi-C

Cited from Paper "A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping"

In situ Hi-C: DNA-DNA proximity ligation is performed in intact nuclei.

Monday, 20 June 2016

Multiscale analysis of genome-wide replication timing profiles using a wavelet-based signal-processing algorithm

"Replication starts from a set of initiation loci, called replication origins, where two replication forks are assembled and begin replicating DNA while proceeding in opposite directions, away from the loci; fork progression continues until two converging forks 'collide' at a terminus of replication."

"The DNA replication program in a cell is defined as the temporal sequence of locus replication events during the S phase. The program depends on the locations of the replication origins, their activation times and the speed at which replication forks move along the DNA double helix."




DNA Replication Timing

  1. Replication of eukaryotic chromosomes takes place in segments.
  2. The rate of elongation of replication forks varies little throughout S phase.
  3. It is the temporal order  of replication, not the sites of initiation, that is  conserved among species;
  4. In multicellular  but not unicellular organisms, early replication  is correlated with transcriptional activity and is
    developmentally regulated.
  5. The importance  of large-scale chromatin folding in the regulation of replication timing in both yeasts and  mammals.
  6.  

Friday, 3 June 2016

hESNet: Human Embryonic Stem Cell Transcription Network

http://wanglab.ucsd.edu/star/hESnet/

Differences Between Epiblast and Embryonic Stem Cells

Cited from Collection: Naive Pluripotency

"Ground-state naive pluripotency is established in the epiblast of the mature blastocyst and may be captured in vitro in the form of embryonic stem cells. Although rodent cells can exist in both primed and naive pluripotent states, establishing a naive state in human cells has been difficult to obtain."

Cell Identity Markers

Gata4, a primitive endoderm marker (Paper: Control of ground-state pluripotency by allelic regulation of Nanog)
Fgf4, pluripotency-associated genes (Paper: Control of ground-state pluripotency by allelic regulation of Nanog)
Pecam1, a non-pluripotency transmembrane protein on cell surface expressed in mouse embroynic stem cells.
Bmp4, a non-pluripotency factor expressed in mouse embroynic stem cells, a member of the bone morphogenetic protein family which is part of the transforming growth factor-beta superfamily.



Naive Epiblast Explanation

Cited from the paper "Nanog Is the Gateway to the Pluripotent Ground State"

" After fertilization, mammalian zygotes follow a program of cleavage divisions and elaborate two extraembryonic lineages, trophoblast and hypoblast (Selwood and Johnson, 2006). This preparatory phase of development culminates in creation of the embryo founder tissue, a population of unrestricted pluripotent cells known as the epiblast (Gardner and Beddington, 1988 and Nichols and Smith, 2009). The epiblast proliferates to provide the substrate for axis formation, germlayer specification, and gastrulation. Naive early epiblast cells can be immortalized in culture in the form of embryonic stem (ES) cells (Brook and Gardner, 1997, Evans and Kaufman, 1981 and Martin, 1981). Pluripotent cells can also be created outside the embryo by reprogramming somatic cells, either by fusion with pre-existing pluripotent cells (Miller and Ruddle, 1976, Tada et al., 1997, Tada et al., 2001 and Takagi et al., 1983) or, more compellingly, by transfection with regulatory transcription factors (Takahashi and Yamanaka, 2006)."

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cited from the paper "Control of ground-state pluripotency by allelic regulation of Nanog"

"The ICM (inner cell mass) of the late blastocyst contains two lineages: the extra-embryonic primitive endoderm, and the ‘ground-state’ pluripotent epiblast6, 8, which gives rise to the embryo. Inner cells expressing Nanog biallelically also express Oct4 but not Gata4, a primitive endoderm marker9, and therefore are epiblast cells."

Wednesday, 1 June 2016

How to Import narrowPeak into R

How to import narrowPeak files

GenomicRanges: 'queryHits' and 'subjectHits' in findOverlaps

Cited from IRanges - minoverlap

> hits = findOverlaps(ir, minoverlap=100L)

It returns an object that tells which queries overlap which subjects, where query and subject are in effect the same ranges.