Wednesday 18 December 2019

Git LFS Installation Bug


https://github.com/git-lfs/git-lfs/issues/3018

Monday 16 September 2019

Passing Values Between Rules

Cited from I want to pass variables between rules. Is that possible?

from pytools.persistent_dict import PersistentDict

storage = PersistentDict("mystorage")

rule a:
    input: "test.in"
    output: "test.out"
    run:
        myvar = storage.fetch("myvar")
        # do stuff

rule b:
    output: temp("test.in")
    run:
        storage.store("myvar", 3.14)

Wednesday 21 August 2019

R make.names

make.names {base}R Documentation

Make Syntactically Valid Names

Description

Make syntactically valid names out of character vectors.

Read Excel Using the 'readxl' Package

read_excel("...", sheet = string, range = 'B1:C7')

OR

read_excel("...", sheet = string, range = cell_cols('B:C'))

read_excel("...", sheet = string, skip = 1)

Thursday 18 July 2019

Git Reset

Cited from Undoing Staged Changes (before committing)

The reset command resets the staging area to be whatever is in HEAD. This clears the staging area of the change we just staged.

The reset command (by default) doesn’t change the working directory. So the working directory still has the unwanted comment in it. We can use the checkout command of the previous lab to remove the unwanted change from the working directory.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cited from Removing Commits from a Branch

When given a commit reference (i.e. a hash, branch or tag name), the reset command will …

  1. Rewrite the current branch to point to the specified commit
  2. Optionally reset the staging area to match the specified commit
  3. Optionally reset the working directory to match the specified commit
git reset --hard v1

The --hard parameter indicates that the working directory should be updated to be consistent with the new branch head.




Branch

Cited from Getting Old Versions

‘master’ is the name of the default branch. By checking out a branch by name, you go to the latest version of that branch.

Git Log

Cited from Git History

git log --pretty=format:'%h %ad | %s%d [%an]' --graph --date=short

--pretty="..." defines the format of the output.
%h is the abbreviated hash of the commit
%d are any decorations on that commit (e.g. branch heads or tags)
%ad is the author date
%s is the comment
%an is the author name
--graph informs git to display the commit tree in an ASCII graph layout
--date=short keeps the date format nice and short

Wednesday 3 July 2019

Epigenetics Notes

Cited from the book "Epigenetics"

H3K9 methylation catalyzed by Suv39H1, and is bound by heterochromatin protein HP1.

Ezh2 of the PRC2 complex methylates H3K27, a mark associated with heterochromatin. Eed of the PRC2 complex interacts with methylated H3K27, and stimulates the methyltransferase activity of Ezh2.

Histone H3 is incorporated into chromatin only during DNA replication. In contrast, the histone variant H3.3, which differs from H3 by four amino acids, is incorporated into the nucleosomes in a replication-independent manner, and it tens to accummulate in activate chromatin, in which it is enriched in the "active" histone modifications.

The presence of H3.3 is sufficient to maintain the active state,  although it would be diluted twofold, enough H3.3 is sufficient to maintain the active state. The consequent transcription would result in replacement of H3-containing nucleosomes with H3.3, thus perpetuating the active state in the next generation.

The variant macroH2A (mH2A) is associated with the irreversible inactivation of the mouse X chromosome. Incorporation of this variant helps confer resistance to reprogramming of the inactive X in nuclear transfer experiment.

Most eukaryotic transcription factors do not have long residence times at their binding sites, but turn over rapidly.

In Schizosaccharomyces pombe, formation of heterochromatin involves the production of RNA transcripts particulary from repeated sequences, that are processed into small RNAs through the action of proteins such as Dicer, Argonaute,  and RNA-dependent RNA polymerase. These RNAs are subsequently recruited to the homologous DNA sites as a part of complexes that will eventually include enzymes that deliver "silencing" histone modifications, thus initiating the formation of heterochromatin.

In Drosophila, the Polycomb group genes are responsible for establishing a silenced chromatin domain that is maintained through all subsequent cell divisions.

More than 20 demethylases identified belong to either the LSD family or the JmjC family, demonstrating the reversibility of all methylation states at almost all major histone lysine methylation sites.

The LSD family of demethylases can only demethylate mono- and dimethylated (me1, me2), but not the trimethylated (me3) lysine residues.

In addition to the the LSD family of demethylases (containing LSD1 and LSD2), the discovery of the JmjC domain-contianing demthylase family (more than 20 HDMs) has further shown the reversibility of all histone methylation states (mono-, di-, and trimethylation) at almost all major histone methylation modification sites.

H3K79 methylation is the only residue to be methylated by a non-SET domain histone methyltransferase, Dot1/Dot1L.

eRNAs function in controlling mRNA transcription, challenging the idea that enahncers are merely sites of transcription factor assembly.

Enhancers recruit general coactivators, such as p300/cbp.

Enhancer RNAs are clearly distinguishable from the canonical long noncoding RNAs (lncRNAs) whose functions have been better characterized. A first distinction is that although lncRNAs are braodly defined based on the presence of H3K4me3 at their promoters, eRNAs can be produced from enhancers without detectable levels of H3K4me3.  The difference may stem from the 10- to 100-fold lower expression levels of eRNAs relative to lncRNAs, as H3K4me3 levels generally correlate with gene expression level. Second, unlike the promoters of lncRNAs and protein-coding genes, enhancers show little bias in the direction of transcription initiation. Third, whereas lncRNAs undergo maturation processes such as splicing and polyadenylation, eRNAs are shorter (<2 kb), with little evidence of being consistently spliced or polyadenylated.

It has been observed a relatively small number of genomic loci that cannot be easily classified either as enhancers or promoters of lncRNAs because of the presence of both H3K4me3 and H3K4me1 marks. Indeed, protein-coding promoters have been reported to act as enhancers, regulating other nearby promoters.

In general, active marks include acetylation, arginine methylation, and some lysine methylation, such as H3K4 and H3K36. H3K79(me) found in the globular domain, has an antisilencing function. Repressive marks include methylation at H3K9, H3K27, and H4K20.

Histone modification established by writers or removed by erasers.

HATs acetylate specific lysine residues in histone substrates, and are reversed by the action of opposing histone deacetylases (HDACs).

The histone kinase family of enzymes phosphorylates specific serine, threonine, or tyrosine residues, whereas the phosphatases remove phosphorylation marks.

Particularly well known are mitotic kinases, such as cyclin-dependent kinase or auroa kinases, which catalyze the phorphorylation of core (H3) and linker (H1) histones. Less clear in each case are the opposing phosphatases that act to rever these phorphorylations as cells exit mitosis.

To general classes of methylating enzymes have been described: the KMTs (histone lysine methyltransferases), which act on lysine residues, and the protein arginine methyltransferases, whose substrate is arginine.

Arginine methylation is indirectly reversed by the action of protein arginine deiminases, which convert methylarginine (or arginine) to a citrulline residue.

THe enzymes that remove methyl group from lysine residues, the so-called histone lysine demethylases (KDMs), come in two flavors. One family comprises the amino oxidases represented by lysine-specific histone demethylase 1 (LSD1, KDM1A) and lysine-specific histone demthylase 2 (LSD2, KDM1B) that use FAD and oxygen as cofactors for demthylation and exclusively target mono- and dimethylation of H3K4 and H3K9. As part of the CoRest corepressor complex, LSD1 targets H3K4me1-2. LSD1 also associates with the androgen receptor, and in this case, targets H3K9me1-2 during activation of transcription. The other family of KDMs includes hydroxylases whose members all share the JMJC domain, which is the catalytic active site that mediates demethylation using 2-oxogulatarate and iron as cofactors. This family of demthylases targets different residues on the H3 tail.

The dyad axis of nucleosomes (where the DNA enters and exits the nucleosome)

The addition of bulky adducts, such as ubiquitin, ADP-ribose, and O-GlcNAcylation may also induce different arrangments of the histone tails and decondencse nuclosome arrays.

PTMs in the globular histone fold domains, such as on H3K56, H3K64, or H3K122, are known to impact on chromatin structure and assembly.

Certain binding partners (bromodomain, chromodomain, Tudor domain) have a particular affinity for a certain histone modification and, hence, are known to "dock" onto specific modified histone tails.

The bromodomain, a motif that recognizes acetylated histone residues, is often, but not always, part of a HAT enzyme that acetylates target histones as a portion of a larger chromatin-remodeling complex.

Methylated lysine residues embedded in histone tails can be read by "aromatic cages" present in chromodomains, or similar domains (e.g., MBT, Tudor) contained within complexes that facilitate downstream chromatin modulating events.

A major mechanism by which transitions in the chromatin template are induced is by signaling the recruitment of chromatin-"remodeling" complexes that use energy (ATP-hydrolysis) to change chromatin and nucleosome composition in a noncovalent manner.

Nucleosomes, particularly when bound by repressive chromatin-associated factors, often impose an intrinsic inhibition to transcription machinery.

Chromatin-remodeling activities often work in conert with activating chromatin-modifying enzymes, but are also known to stabilize repressed rather than active chromatin states.

A series of histone modifications and docking effectors act in concert with chromatin-remodeling complexes, such as SAGA and FACT (facilitate chromatin transcription) to allow RNA Pol II passage through nucleosomal arrays.

Variant isoforms exist for all of the core histones except H4.

To major H3 isoforms exist, H3.1 and H3.2, often referred to as 'canonical' H3. Less widespread 'variant' H3 isoforms also exist, including H3.3. and a centromere-specific isoforms, CENP-A.

Histone variants also have their own pattern modifications.

Transcriptionally active genes have general canonical histone H3 (H3.1 and H3.2), exchanged by the H3.3 variant in a transcription-coupled (replication-independent) mechanism. H3.3 is also incorporated into repressed chromatin regions, such as pericentric, and telomeric regions, by distinct histone chaperon complexes. Replacement of the canonical H3 with the H3.3 variant into active chromatin occurs via the action of the HIRA (histone regulator A) exchange complex, whereas incorporation of H3.3. into heterochromatic regions is mediated by the DAXX-ATRX complex.

The incorporation of canonical histone H3 (H3.1 and H3.2) like canonical H4, is restricted to S phase, and the incorporation is faciliated by a dedicated complex known as chromatin asembly factor 1 (CAF1).

H2A is replaced by H2A.Z through the 'on-demand' activity of the Swi2/Snf2-related ATPase 1 exchanger complex. Replacement of canonical H2A with the H2A.Z variant correlates with transcriptional activity, and can mark the 5' end of nucleosome-free promoters.

Dual incorporation of H2A.Z and H3.3 into nucleosomes results in unstable chromatin, which closely correlates with enhancer elements via a more open chromatin fiber. However, H2A.Z alone has also been associated with repressed chromatin.

CENP-A is essential for centromeric function, and, hence, chromosome segration. H2A.X, together with other histone marks, is associated with sensing DNA damage and appears to signify DNA lesion for recruitment of DNA repair complexes. Macro H2A is a histone variant that specifically associates with the inactive X (Xi) chromosome in mammals.

Combinatorial readout of chromatin modifications:

Intrahistone interplay: Histone modifications within one histone tail generate a specific downstream readout. For example, the position of the catalytic JMJC domain of the PHF8 KDM is structurally constrained to remove only H3K9me2, but not H3K27me2 because of its anchoring to chromatin via H3K4me3 binding to its PHD domain. The KIAA1718 KDM operates via a similar PHD H3K9 methyl marks. Another illustration is the ejection of HP1 bound to H3K9me3 when the neighboring H3S10 becomes phosphorylated, a mechanism known as 'phospho-methyl' switch.

Interhistone cross talk: Modifications on two distinct histones influence each other. For example, H2B120ub is a prerequsite for H3K4me3 and H3K79me3 to occur.

Intranucleosomal assocation: A chromatin reader is recruited via two distinct modifications within one nucleosome. For example, the PHD domain and bromodomain of BPTF bind to H3K4me3 and H3K16ac within one nucleosome.

Internucleosomal association: A chromatin reader interacts with histone modifications present on different nucleosomes. For example, the two chromodomains of HP1 dimers link H3K9me3-containing nucleosomes.

Histone-DNA modification cross talk: Modifications on histones and DNA influence each other. For example, recruitment of the Set1 KMT via Cfp1 bound to unmodifiied CpG-rich DNA results in H3K4me4 chromatin, which, in turn, inhibits association of Dnmt3a/L, thereby protecting these regions from DNA methylation. Conversely, recruitment of Dnmts to H3K9me3 chromatin via HP1 leads, subsequently, to DNA methylation.

In most cases involving the deposition of repressive methylation marks, the enzymes responsible for installing the histone modification also interact with factors that bind to it. Examples of such histone modificationbinder pairs are Suv39h1 and HP1 for H3K9me3, or Ezh2 and Eed for H3K27me3.

Transcriptionally activate chromatin correlates with the presence of H3K4me3 around the TSS and H3K36me3 within the coding sequences.

The carboxyl terminus of the largest RNA Pol II subunit (known as CTD) may become phosphorylated in the hepta-peptide repeat (Y-S2-PT-S5-PS), with either S2 or S5 phosphorylated.

RNA Pol II is recruited to promoter in the first place by a family of factors known as the 'general transcription factors' (GTFs), but in a phosphoyrlation-dependent manner; RNA Pol II is recruited to promoters via GTFs in the non-phosphorylated form and, then, recruits TFIIH, which contains the CDK7 kinase that phosphorylates serine 5 of the heptapepetide repeat. This phosphorylation disrupts interactions of RNA Pol II with most of the GTFs and allows the initiation of transcription. RNA Pol II, then, moves along the template, but soon after initiation, when the nascent RNA exits the catalytic channel, the polymerase ceases transcription to allow a series of regulatory steps, including the capping enzymes through recognition of CTD, phosphorylated at serine 5  within the heptapeptide repeat. Once this is accomplished, a kinase that phosphorylates serine 2 of the CTD ensures that RNA Pol II is ready to escape the promoter and engage in elongation, a step known as promoter clearance.

The differnetial phsophoryaltion of the CTD (serine 2 and 5) recruits different histone modifying factors, including KMTs that act on H3K4 (serine-5 phsophorylation), such as Set 1 in yeast and the SET1 and MLL1 (KMT2) complexes in mammals, and KMTs that act on H3K36 (serine-2 phosphorylation), such as Set2 (KMT3) in yeast and related mammalian orthologs.

H3K4me3 is recognized y the chromodomains of mammalian CHD1, which, in turn, recruits factors that affect transcription elongation, such as FACT, the PAF complex, and factors modulating splicing, among others.

H3K4me3 is also 'read' by the PHD finger present in BPTF, a subunit of the NURF (nucleosomer-remodeling factor).

The H3K36me3 modification recruits distinct factors, an example of which is the Sin3A complex, containing HDACs. It is proposed that the deacetylases associated within the Sin3A complex function to overcome the acetylation required for transcription, promoting the the reestablishment of nucleosome and suppressing cryptic inititoin in the open chromatin as a function of transcription.

Euchromatin decondensed chromatin examplified by histone acetylation (e.g., H4K16ac), and for the most part, is transcriptionally active.

Heterochromatin -- permanently silent chromatin (constitutive heterochromatin), in which genes are rarely expressed in any cell type of the organism; facultative heterochromatin, in which genes can be depresssed during a specific cell cycle or developmental stage.

Constitutive heterochromatin is present at pericentric or subtelomeric regions of the genome, and is more uniform in structure. Facultative heterochromatin is more heterogeneous.

In mammals, the histone H1 family occurs with as many as eight different isoforms; some H1 isoforms are redundant, whereas others hold tissue-specific functions. H1 itself can be covalently modified as phosphorylated, methylated, poly(ADP) ribosylated, etc.

Lamin-associated domains (LAD) are predominatly heterochromatin-rich regions located at the nuclear periphery.

Compact metaphase chromosomes during mitosis and meiosis are achieved by the phosphorylation of linker H1 and select phosphorylation sites in the amino terminus of histone H3 (e.g., serine 10 and 28). In addition, the ATP-dependent action of topoisomerase II, the condensin and cohesin complexes are absolutely required for higher-order structuring in mitosis and meisosis as there occurs very little chromatin condensation in their absence.

Telomeres act as chromosomal caps, preventing erosion of the chromosomal ends during subsequent cell divisions.

Centromeres provide an attachment anchor for spindle microtubules during nuclear division.

5mC mainly occurs at CpG dinucleotides in mammals, and its distribution along the genome shows enrichment at noncoding regions (e.g., centromeric heterochromatin) and interspersed repetitive elements (e.g., retrotransposons). Conversely, its abundance is low in 'CpG islands' present in the 5' regulatory regions of many genes. However, most exons and introns are highly DNA methylated (70%-80% of CpG sites).

Maintenance methylation during DNA replication is regulated by an intrinsic autoregulatory loop of DNMT1; DNMT1, when bound to unmethylated CpGs at the replication fork via its CXXC motif, becomes inhibited. However, when it encounters a hemimethylated site, it become allosterically activated to, then, add a methyl group on the unmetylated sister strand. Importantly, the interaction of DNMT1 with UHRF1 enhances the stability of DNMT1. UHRF1 is a protein that specifically interacts with H3K9me3-marked chromatin, providing a fucntional connection between DNA methylation and repressive histone methylation.

De novo methylation is established by DNMT3A and DNMT3B enzymes, which associate with the catalytically inactive DNMT3L. There is an antagonism between H3K4me3 and DNA methylation in which the presence of this histone mark inhibits binding of DNMT3L and the de novo DNMT3A and DNMT3B enzymes, thereby protecting CpG islands from DNA methylation.

In N.crassa and plants, highly repetitive tandem repeat sequences of the genome (e.g., pericentric chromatin) reply on repressive H3K9 methylation marks to direct DNA methylation de novo. Interspersed repeats can also signal de novo DNA methylation and retrotransposon silencing in the male germline of mammals.

Mouse Dnmt3L may function by scanning the genome to identify high levels of homology-heterology junctions, which signal the requirement for DNA methylation. In plants, ncRNAs provide the signal for de novo DNA methylation through a unqiue mechanism termed RNA-dependent DNA methylation.

DNA methylation is known to disturb the recognition sites of transcriptional regulators, such as CTCF.

In a B- and T-cell context, 5mC-T transitions through a deamination reaction, activatively catalyzed by the activation-induced deaminase (AID), causes somatic hypermutation at the B- and T-cell antigen receptor loci.

RNAi is a host dense mechanism that breaks down dsRNA species into small RNA molecules (known as short interfering RNA or siRNA). This process ultimately leads to RNA degration or the use of the small RNAs to inhbit translation, which is known as posttrancriptional gene silencing (PTGs).

The most recently discovered transcriptional gene silencing mchansims leads to heterochromatin formation by using the RNAi machinery to act in cis at sites of nascent transcription to recruit epigenetic machinery.









         

Wednesday 15 May 2019

Differentially Methylated Loci vs Differentially Methylated Regions

Cited from the paper "BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions"

"""
However, various authors have noted that methylation levels are strongly correlated across the genome [24, 25]. Furthermore, functionally relevant findings are generally associated with genomic regions rather than single CpGs, either CpG islands [26], CpG island shores [27], genomic blocks [1], or generic 2 kb regions [3].
"""

RRBS vs WGBS

Why use RRBS (reduced representation bisulfite sequencing) instead of general BiSeq?

Tuesday 14 May 2019

MethylC-seq or BS-seq

Cited from the paper "MethylC-seq library preparation for base-resolution whole-genome bisulfite sequencing"

"""
MethylC-seq library preparation protocol overview. gDNA (i) is fragmented to ~200 bp by sonication (ii). DNA fragments containing damaged or incompatible 5′- and/or 3′-protruding ends are converted to 5′-phosphorylated, blunt-ended DNA (iii). Blunt-ended DNA fragments are converted to DNA with 3′-dAMP overhangs (iv). Methylated Y-shaped adapters are ligated to the dA-tailed DNA fragments (v). All cytosines in the adapters must be methylated to allow for primer binding and amplification after bisulfite conversion. Adapter-ligated DNA fragments are denatured, and unmethylated cytosine is converted to uracil during sodium bisulfite treatment (vi). Bisulfite-treated DNA fragments remain single-stranded as they are no longer complementary. Low-cycle PCR amplification is performed with a polymerase that can tolerate uracil residues (vii). The final library fragments contain thymines and cytosines in place of the original unmethylated cytosines and methylated cytosines, respectively (viii).
"""

Illumina Methylation Assay

Illumina Methylation Assay

High density DNA methylation array with single CpG site resolution

Comprehensive high‑throughput arrays for relative methylation (CHARM)

Cited from the paper "Comprehensive High-Throughput Arrays for Relative Methylation (CHARM)"

"""
Overview of McrBC-based fractionation (Lippman et al., 2005; Ordway et al., 2006) coupled with CHARM analysis (Irizarry et al., 2008). Genomic DNA is sheared to 1.5 to 3.0 kb and divided into two equal parts. The first is digested with McrBC, a methyl-cytosine insensitive enzyme that recognizes PumC(N40–3000)mCPu, and the second is untreated. Both fractions are then resolved side by side on a 1% agarose gel, and fragments between 1.65 kb and 3.0 kb are excised and purified. Next, the untreated fraction, representing total input DNA, is labeled with cyanine-3 (Cy3) and the McrBC-treated fraction, representing unmethylated DNA, is labeled with cyanine-5 (Cy5) followed by cohybridization to a CHARM microarray. Sequences that are methylated will be present in the input fraction (Cy3) and depleted in the methyl-depleted fraction (Cy5). For each probe on the array, a log ratio of the Cy3 to Cy5 intensity is calculated and represents the methylation level (M-value) at each locus, with larger M-values representing more methylation and smaller M-values representing less methylation.
"""

Cited from the paper "Profiling genome-wide DNA methylation"
"""
McrBC, an  enzyme that digests methylated DNA, to fractionate DNA  and subsequently utilizes array hybridization. 
"""

Isoschizomers and Neoschizomers

Cited from Wiki

"""
Isoschizomers are pairs of restriction enzymes specific to the same recognition sequence. For example, SphI (CGTAC/G) and BbuI (CGTAC/G) are isoschizomers of each other. The first enzyme discovered which recognizes a given sequence is known as the prototype; all subsequently identified enzymes that recognize that sequence are isoschizomers. Isoschizomers are isolated from different strains of bacteria and therefore may require different reaction conditions.
"""

"""
An enzyme that recognizes the same sequence but cuts it differently is a neoschizomer . Neoschizomers are a specific type (subset) of isoschizomer. For example, SmaI (CCC/GGG) and XmaI (C/CCGGG) are neoschizomers of each other. 
"""

Wednesday 20 February 2019

Bisulfite Sequencing

Cited from the paper "Reduction with age in methylcytosine in the promoter region −224∼−101 of the amyloid precursor protein gene in autopsy human cortex"

"""
Bisulfite genomic sequencing is a sensitive method for the detection of methylated cytosines. 

In this method, sodium bisulfite (NaHSO3) reacts readily with the 5,6-double bond of cytosine but poorly with methylated cytosine. 

The sulfonated cytosine reaction intermediate is readily deaminated to sulfonated uracil from which the sulfonate group can be removed under alkaline conditions to form uracil that is recognized as a thymine by Taq polymerase.

Therefore, after PCR, cytosine remains only at positions where it is methylated in the starting template DNA. 

"""

Tuesday 12 February 2019

Transcripts Tag 'Basic'

What is the "basic" annotation in the GTF/GFF3?

"""
The transcripts tagged as "basic" form part of a subset of representative transcripts for each gene. This subset prioritises full-length protein coding transcripts over partial or non-protein coding transcripts within the same gene, and intends to highlight those transcripts that will be useful to the majority of users.
"""