A meta-analysis of gene expression quantitative trait loci in brain
eQTL mapping methodology
Monday, 30 March 2015
Friday, 27 March 2015
Thursday, 26 March 2015
Circos Plots: Tick Marks - Basics
Excerpts from Tick Marks, Grids and Labels
"Ticks, tick labels and grids are defined in the <ticks> block, which can contain any number of <tick> blocks, each defining ticks with a different spacing."
"Ticks refers to the radial lines that show progression of distance along the ideogram. Tick labels are the accompanying text elements that mark the position of the tick."
"The radius specifies the radial position of the tick marks, which you generally want to set to the outer ideogram radius."
"The label multiplier is the constant used to multiply the tick value to obtain the tick label. For example, if the multiplier is
"The orientation controls whether the ticks and labels face out (
"By referencing the position relative to the image, and not the ideogram, you decouple the position of the tick from the position of the ideogram. This absolute placement is useful if you know you want the ticks at a specific image position, regardless of the position of the ideograms. radius=dims(image,radius)-25p."
"Typically, one defines several sets of ticks by using <tick> blocks. Each set defines the display of ticks at a given spacing. For example, one could have three sets of ticks spaced at 1Mb, 5Mb and 10Mb, respectively, and formatted so that the 1Mb ticks are small and without labels whereas the 5Mb and 10Mb be larger and with labels. The 10Mb ticks might use a bolder font, for example, to give them greater visual weight."
"Unless force_display is set for a tick set, ticks at smaller spacing are not drawn at a position that already has another tick. In other words, the formatting of a tick mark is defined by the block associated with the spacing value that defines the largest divisor of the tick value."
"When tick size is expressed in relative terms, the comparator is the tickness of the ideogram. Therefore ticks with
"
When
"Ticks, tick labels and grids are defined in the <ticks> block, which can contain any number of <tick> blocks, each defining ticks with a different spacing."
"Ticks refers to the radial lines that show progression of distance along the ideogram. Tick labels are the accompanying text elements that mark the position of the tick."
"The radius specifies the radial position of the tick marks, which you generally want to set to the outer ideogram radius."
"The label multiplier is the constant used to multiply the tick value to obtain the tick label. For example, if the multiplier is
1e-6
, then the tick mark at position 10,000,000
will have a label of
10
. The multiplier is applied to the raw tick value, regardless of the
value of chromosomes_unit
.""The orientation controls whether the ticks and labels face out (
orientation=out
) or in (orientation=in
).""By referencing the position relative to the image, and not the ideogram, you decouple the position of the tick from the position of the ideogram. This absolute placement is useful if you know you want the ticks at a specific image position, regardless of the position of the ideograms. radius=dims(image,radius)-25p."
"Typically, one defines several sets of ticks by using <tick> blocks. Each set defines the display of ticks at a given spacing. For example, one could have three sets of ticks spaced at 1Mb, 5Mb and 10Mb, respectively, and formatted so that the 1Mb ticks are small and without labels whereas the 5Mb and 10Mb be larger and with labels. The 10Mb ticks might use a bolder font, for example, to give them greater visual weight."
"Unless force_display is set for a tick set, ticks at smaller spacing are not drawn at a position that already has another tick. In other words, the formatting of a tick mark is defined by the block associated with the spacing value that defines the largest divisor of the tick value."
"When tick size is expressed in relative terms, the comparator is the tickness of the ideogram. Therefore ticks with
size=0.1r
will have
a length that is 1/10th of the ideogram thickness. Tick thickness, on
the other hand, uses the tick size as the comparator. Thus, ticks with
thickness=0.1r
will have a width that is 1/10th the size of their
length. Similarly, if tick label size is defined relatively, it will
be scaled by tick size.""
When
chromosomes_display_default=yes
, you do not need to define
which ideograms ticks appear on because tick mark visibility is on by
default and you only need to define where tick marks are not shown. If chromosomes_display_default=no
, then things get a little bit
more complicated, because you now need to define where tick marks will
be shown and these definitions can contain regions of exclusion."Reporting Unwanted Sexual Behaviour in a Black Cab, Minicab, or on Public Transport in the UK
Quoted from an online source.
"If you would like to report any unwanted sexual behaviour in a black cab, minicab, or on public transport, please report it by calling 101 or texting 61016.
For further information or support please follow the links or call the numbers for the charities below.
Rape Crisis (England & Wales)
Website: www.rapecrisis.org.uk
Telephone Number: 08088029999
Victim Support
Website: https://www.victimsupport.org.uk/
Telephone Number: 08081689111
hollaback
Website: http://www.ihollaback.org/about/
In an emergency always call 999."
"If you would like to report any unwanted sexual behaviour in a black cab, minicab, or on public transport, please report it by calling 101 or texting 61016.
For further information or support please follow the links or call the numbers for the charities below.
Rape Crisis (England & Wales)
Website: www.rapecrisis.org.uk
Telephone Number: 08088029999
Victim Support
Website: https://www.victimsupport.org.uk/
Telephone Number: 08081689111
hollaback
Website: http://www.ihollaback.org/about/
In an emergency always call 999."
Sunday, 22 March 2015
Monday, 16 March 2015
Circos Plots
################################################
# chromosomes_units
Excerpts from "Drawing Ideograms"
"For example, chromosomes_units = 1000000 chromosomes = hs1:0-100;hs2:50-150;hs3:50-100;hs4;hs5;hs6;hs7;hs8
Will draw all 8 chromosomes, but only 0-100 Mb of hs1, 50-150Mb of hs2 and 50-100 Mb of hs3. The start and end ranges are given in units of chromosomes_units."
################################################
# karyotype file
Excerpts from "Karyotypes"
"The karyotype file defines the axes. In biological context, these are typically chromosomes, sequence contigs or clones.
Each axis (e.g. chromosome) is defined by unique identifier (referenced in data files), label (text tag for the ideogram seen in the image), size and color."
"Chromosome definitions are formatted as follows
chr - ID LABEL START END COLOR"
'The first two fields are always "chr", indicating that the line defines a chromosome, and "-". The second field defines the parent structure and is used only for band definitions.'
" Consider using the conventional chromosome color scheme as defined in the etc/color.conf configuration file. Colors are defined for each human chromosome and are named similiarly: chr1, chr2, ... chrx, chry, chrun. Colors must be in lowercase."
################################################
# external imports
Excerpts from "Configuration Files - Syntax, Colors, Fonts and Units"
"Two files should always be imported from etc/ in the Circos distribution. These are
# colors, fonts and fill patterns
<<include etc/colors_fonts_patterns.conf>>
# system and debug parameters
<<include etc/housekeeping.conf>>"
#################################################
# <image> block
Excerpts from "PNG Output"
"I suggest that you always import the default image settings.
<image>
# import defaults from Circos distribution
<<include etc/image.conf>>
</image>
The settings define the output file to be 3,000 x 3,000 pixels, with white background, named circos.png, which will be placed in the current directory."
"If you would like to overwrite any of these parameters, use the * suffix syntax.
# circos.conf
<image>
<<include etc/image.conf>>
file* = myfile.png
radius* = 1000p
</image>
"Output image directory and filename are defined in the dir and file parameters of the <image> block. The produced image is always square, and its size set by the radius parameter (this is the size of the inscribed circle). If radius=1500p, then the image will be 3,000 x 3,000 pixels in size."
#################################################
# Ticks & Labels
Excerpts from "Ticks & Labels"
'The radial position of the labels can be adjusted using label_radius. The quantity used as the reference for relative units depends on which parameter is defined. It is usually defined as the "parent container" of the element. For example, when definition ideogram position, the reference is image radius. When using track position, the reference is ideogram radius. As a result, when the parent element is moved (e.g. ideogram), all other elements move with it (e.g. data tracks).'
"Ticks are defined by group. You can have absolute or relatively spaced ticks, as well as ticks at specific positions. The primary parameter in each <tick> block is spacing. This defines the distance between adjacent ticks in this group. Typically, this value is defined in terms of chromosomes_units parameter — the suffix u is used for this — to keep the number legible. If a tick belongs to multiple groups, the group with largest spacing is prefered. Thus, the tick at 50 Mb will take its formatting from the spacing=25u group, not the spacing=5u group."
# chromosomes_units
Excerpts from "Drawing Ideograms"
"For example, chromosomes_units = 1000000 chromosomes = hs1:0-100;hs2:50-150;hs3:50-100;hs4;hs5;hs6;hs7;hs8
Will draw all 8 chromosomes, but only 0-100 Mb of hs1, 50-150Mb of hs2 and 50-100 Mb of hs3. The start and end ranges are given in units of chromosomes_units."
################################################
# karyotype file
Excerpts from "Karyotypes"
"The karyotype file defines the axes. In biological context, these are typically chromosomes, sequence contigs or clones.
Each axis (e.g. chromosome) is defined by unique identifier (referenced in data files), label (text tag for the ideogram seen in the image), size and color."
"Chromosome definitions are formatted as follows
chr - ID LABEL START END COLOR"
'The first two fields are always "chr", indicating that the line defines a chromosome, and "-". The second field defines the parent structure and is used only for band definitions.'
" Consider using the conventional chromosome color scheme as defined in the etc/color.conf configuration file. Colors are defined for each human chromosome and are named similiarly: chr1, chr2, ... chrx, chry, chrun. Colors must be in lowercase."
################################################
# external imports
Excerpts from "Configuration Files - Syntax, Colors, Fonts and Units"
"Two files should always be imported from etc/ in the Circos distribution. These are
# colors, fonts and fill patterns
<<include etc/colors_fonts_patterns.conf>>
# system and debug parameters
<<include etc/housekeeping.conf>>"
#################################################
# <image> block
Excerpts from "PNG Output"
"I suggest that you always import the default image settings.
<image>
# import defaults from Circos distribution
<<include etc/image.conf>>
</image>
The settings define the output file to be 3,000 x 3,000 pixels, with white background, named circos.png, which will be placed in the current directory."
"If you would like to overwrite any of these parameters, use the * suffix syntax.
# circos.conf
<image>
<<include etc/image.conf>>
file* = myfile.png
radius* = 1000p
</image>
"Output image directory and filename are defined in the dir and file parameters of the <image> block. The produced image is always square, and its size set by the radius parameter (this is the size of the inscribed circle). If radius=1500p, then the image will be 3,000 x 3,000 pixels in size."
#################################################
# Ticks & Labels
Excerpts from "Ticks & Labels"
'The radial position of the labels can be adjusted using label_radius. The quantity used as the reference for relative units depends on which parameter is defined. It is usually defined as the "parent container" of the element. For example, when definition ideogram position, the reference is image radius. When using track position, the reference is ideogram radius. As a result, when the parent element is moved (e.g. ideogram), all other elements move with it (e.g. data tracks).'
"Ticks are defined by group. You can have absolute or relatively spaced ticks, as well as ticks at specific positions. The primary parameter in each <tick> block is spacing. This defines the distance between adjacent ticks in this group. Typically, this value is defined in terms of chromosomes_units parameter — the suffix u is used for this — to keep the number legible. If a tick belongs to multiple groups, the group with largest spacing is prefered. Thus, the tick at 50 Mb will take its formatting from the spacing=25u group, not the spacing=5u group."
.bashrc and .bash_profile
Excerpts from "What is the purpose of .bashrc and how does it work?"
".bashrc is a shell script that Bash runs whenever it is started interactively. You can put any command in that file that you could type at the command prompt. You put commands here to set up the shell for use in your particular environment, or to customize things to your preferences."
"Contrast .bash_profile and .profile which are only run at the start of a new login shell. (bash -l) You choose whether a command goes in .bashrc vs .bash_profile depending on on whether you want it to run once or for every interactive shell start."
".bashrc is a shell script that Bash runs whenever it is started interactively. You can put any command in that file that you could type at the command prompt. You put commands here to set up the shell for use in your particular environment, or to customize things to your preferences."
"Contrast .bash_profile and .profile which are only run at the start of a new login shell. (bash -l) You choose whether a command goes in .bashrc vs .bash_profile depending on on whether you want it to run once or for every interactive shell start."
Sunday, 15 March 2015
Saturday, 14 March 2015
Friday, 13 March 2015
ENCODE Tier 1, Tier 2 and Tier 3 Cells
Excerpts from "ENCODE Cell Types 2007 - 2012"
"Tier1 cells are of higher priority, and should be used within experiments before Tier2 cells. Additional cell types beyond the designated Tier1 and Tier2 could be used for ENCODE production; these are selected at the discretion of individual data production groups, and are designated Tier3."
===============================================
Excerpts from "ENCODE Project Common Cell Types"
"These common cell types include both cell lines and primary cell types, and plans are being made to explore the use of primary tissues and embryonic stem (ES) cells.
Cell types were selected largely for practical reasons, including their wide availability, the ability to grow them easily, and their capacity to produce sufficient numbers of cells for use in all technologies being used by ENCODE investigators. Secondary considerations were the diversity in tissue source of the cells, germ layer lineage representation, the availability of existing data generated using the cell type, and coordination with other ongoing projects. Effort was also made to select at least some cell types that have a relatively normal karyotype."
Detailed descriptions of tier 1 and 2 cells were included in the link above.
"Tier1 cells are of higher priority, and should be used within experiments before Tier2 cells. Additional cell types beyond the designated Tier1 and Tier2 could be used for ENCODE production; these are selected at the discretion of individual data production groups, and are designated Tier3."
===============================================
Excerpts from "ENCODE Project Common Cell Types"
"These common cell types include both cell lines and primary cell types, and plans are being made to explore the use of primary tissues and embryonic stem (ES) cells.
Cell types were selected largely for practical reasons, including their wide availability, the ability to grow them easily, and their capacity to produce sufficient numbers of cells for use in all technologies being used by ENCODE investigators. Secondary considerations were the diversity in tissue source of the cells, germ layer lineage representation, the availability of existing data generated using the cell type, and coordination with other ongoing projects. Effort was also made to select at least some cell types that have a relatively normal karyotype."
Detailed descriptions of tier 1 and 2 cells were included in the link above.
PRO-seq
Excerpts from "Precise Maps of RNA Polymerase Reveal How Promoters Direct Initiation and Pausing"
"PRO-seq uses biotin-labeled ribonucleotide triphosphate analogs (biotin-NTP) for nuclear run-on reactions, allowing the efficient affinity purification of nascent RNAs for high throughput sequencing from their 3’ ends (Figs. 1A, S1A). Supplying only one of the four biotin-A/C/G/UTP restricts Pol II to incorporate a single or at most a few identical bases, resulting in sequence reads that have the same 3’ end base within each library (table S1). Moreover, the incorporation of the first biotin-base inhibits further transcript elongation, ensuring base-pair resolution (fig. S2)."
===============================================
Excerpts from "Genome-Wide Control of RNA Polymerase II Activity by Cohesin"
"PRO-seq varies from GRO-seq in that biotin-labeled ribonucleotides are used to allow run-on for a nucleotide or two, instead of the longer run-on with BrUTP used in GRO-seq. PRO-seq, like GRO-seq [17], is highly sensitive, and unlike ChIP, does not depend on crosslinking efficiency or antibody specificity, and detects elongation-competent Pol II regardless of the phosphorylation status. Nuclei were isolated under conditions of ribonucleotide depletion to halt transcription, but leave Pol II transcriptionally engaged. The nascent RNA transcripts produced upon restart of transcription were used to generate a cDNA library for high-throughput sequencing. Inclusion of sarkosyl in the run-on transcription reaction prevents new transcription initiation, so that only Pol II that is already transcriptionally engaged is detected, and gene body and promoter paused Pol II are detected with equal efficiency [17]"
"PRO-seq uses biotin-labeled ribonucleotide triphosphate analogs (biotin-NTP) for nuclear run-on reactions, allowing the efficient affinity purification of nascent RNAs for high throughput sequencing from their 3’ ends (Figs. 1A, S1A). Supplying only one of the four biotin-A/C/G/UTP restricts Pol II to incorporate a single or at most a few identical bases, resulting in sequence reads that have the same 3’ end base within each library (table S1). Moreover, the incorporation of the first biotin-base inhibits further transcript elongation, ensuring base-pair resolution (fig. S2)."
===============================================
Excerpts from "Genome-Wide Control of RNA Polymerase II Activity by Cohesin"
"PRO-seq varies from GRO-seq in that biotin-labeled ribonucleotides are used to allow run-on for a nucleotide or two, instead of the longer run-on with BrUTP used in GRO-seq. PRO-seq, like GRO-seq [17], is highly sensitive, and unlike ChIP, does not depend on crosslinking efficiency or antibody specificity, and detects elongation-competent Pol II regardless of the phosphorylation status. Nuclei were isolated under conditions of ribonucleotide depletion to halt transcription, but leave Pol II transcriptionally engaged. The nascent RNA transcripts produced upon restart of transcription were used to generate a cDNA library for high-throughput sequencing. Inclusion of sarkosyl in the run-on transcription reaction prevents new transcription initiation, so that only Pol II that is already transcriptionally engaged is detected, and gene body and promoter paused Pol II are detected with equal efficiency [17]"
Pol II Accumulation
Excerpts from "Precise Maps of RNA Polymerase Reveal How Promoters Direct Initiation and Pausing"
"Significant accumulation of Pol II over the 3’ cleavage/polyA region of genes is proposed to facilitate 3’ processing and transcription termination(9,10). Finally, the interplay of transcription rate and splicing efficiency(11) might be reflected in the selective accumulation of Pol II at splice junctions."
"Protein factors such as Negative Elongation Factor (NELF) and DRB Sensitivity Inducing Factor (DSIF)(3,13), DNA elements(14,15), DNA sequence composition(16), nascent RNA processing(16), and nucleosomes(17) can influence pausing."
"ChIP-based methods that collect Pol II or associated RNAs do not distinguish paused Pol II from other Pol II-RNA complexes(16,18,19). The genome-wide nuclear run-on approach (GRO-seq method)(6–8) circumvents these issues by enriching nascent transcripts only associated with actively engaged polymerase with high sensitivity, but it has a resolution of only 30–50 bases(18)."
=================================================
Excerpts from "Pol II waiting in the starting gates: regulating the transition from transcription initiation into productive elongation"
"Recognition of promoters begins with the assembly of a large protein complex containing Pol II and multiple General Transcription Factors (GTFs) on the promoter. The minimal set of factors required for the formation of this pre-initiation complex (PIC) includes Pol II, the GTFs TFIIB, TFIID (which includes the TATA-binding protein, TBP), TFIIE, TFIIF and TFIIH. Extensive interactions between the polymerase and GTFs increase the affinity of Pol II for the promoter region. In addition to the GTFs, recruitment of Pol II to promoters is greatly influenced by the Mediator complex, DNA-binding transcription activators, and a vast repertoire of nucleosome remodeling and modifying complexes (reviewed in [16, 17])."
"While the exact mechanisms of TSS selection by Pol II are not completely clear, its positioning on the promoter may largely depend on the sequence specificity of GTF interactions with promoter DNA. Indeed, while transcription initiation from promoters that contain distinct sequence elements such as the TATA box, Initiator, or Downstream Promoter Element (DPE) is often very focused and likely to arise from a single nucleotide position, initiation from promoters that lack these motifs is much more dispersed (reviewed in [18])."
Distinctions between poised/paused/backtracked/arrested/stalled polII was also made in this review.
Experimental techniques in detecting paused polymerase:
ChIP: "Low spatial resolution, complicating the distinction between engaged Pol II and polymerase in pre-initiation complexes. ChIP may be difficult to perform in samples that are not in a homogeneous suspension, such as tissues."
Permanganate reactivity: "Detects locally melted regions of DNA, including those arising from paused polymerase, by selectively modifying unpaired Thymines within a stable, open transcription bubble. Advantages: Can be performed directly on whole cells or tissues. Achieves essentially nucleotide-level resolution for mapping paused polymerase. Does not require antibodies. Disadvantages: Low throughput (requires Ligation-Mediated (LM) PCR on individual genes - no genome-wide application as of yet). Application is limited to genes where good primers for primer extension and LM PCR can be designed; because of that, permanganate probing in mammalian systems has been markedly less successful than in Drosophila. Since actively elongating polymerase generates very transient melting of DNA at a given location, permanganate is inefficient at detecting productive elongation complexes."
Nuclear run-on: "Detects elongation-competent RNA polymerases through their ability to incorporate a label into nascent RNA. Advantages: Specifically reveals transcriptionally engaged polymerase, with extremely high sensitivity and low background. Adaptable for high-throughput genome-wide applications. Can be used in various organisms. Disadvantages. Requires preparation of nuclei to detect paused polymerase. Resolution for mapping of paused polymerase is reduced by the necessity to allow polymerase to run-on and incorporate labeled nucleotides into RNA."
RNA analysis: "Directly detects short RNA species derived from paused Pol II. Advantages. Sequencing of RNAs from the 3′-end reveals the positions of promoter-proximally paused polymerase at nucleotide-level resolution. Designed for high-throughput genome-wide applications. Does not require antibodies, cell treatment or labeling. Can be used in various organisms. Disadvantages. Requires enzymes available from only one commercial source. Cannot distinguish between RNA species that remain associated with paused Pol II and those that have been released."
"Significant accumulation of Pol II over the 3’ cleavage/polyA region of genes is proposed to facilitate 3’ processing and transcription termination(9,10). Finally, the interplay of transcription rate and splicing efficiency(11) might be reflected in the selective accumulation of Pol II at splice junctions."
"Protein factors such as Negative Elongation Factor (NELF) and DRB Sensitivity Inducing Factor (DSIF)(3,13), DNA elements(14,15), DNA sequence composition(16), nascent RNA processing(16), and nucleosomes(17) can influence pausing."
"ChIP-based methods that collect Pol II or associated RNAs do not distinguish paused Pol II from other Pol II-RNA complexes(16,18,19). The genome-wide nuclear run-on approach (GRO-seq method)(6–8) circumvents these issues by enriching nascent transcripts only associated with actively engaged polymerase with high sensitivity, but it has a resolution of only 30–50 bases(18)."
=================================================
Excerpts from "Pol II waiting in the starting gates: regulating the transition from transcription initiation into productive elongation"
"Recognition of promoters begins with the assembly of a large protein complex containing Pol II and multiple General Transcription Factors (GTFs) on the promoter. The minimal set of factors required for the formation of this pre-initiation complex (PIC) includes Pol II, the GTFs TFIIB, TFIID (which includes the TATA-binding protein, TBP), TFIIE, TFIIF and TFIIH. Extensive interactions between the polymerase and GTFs increase the affinity of Pol II for the promoter region. In addition to the GTFs, recruitment of Pol II to promoters is greatly influenced by the Mediator complex, DNA-binding transcription activators, and a vast repertoire of nucleosome remodeling and modifying complexes (reviewed in [16, 17])."
"While the exact mechanisms of TSS selection by Pol II are not completely clear, its positioning on the promoter may largely depend on the sequence specificity of GTF interactions with promoter DNA. Indeed, while transcription initiation from promoters that contain distinct sequence elements such as the TATA box, Initiator, or Downstream Promoter Element (DPE) is often very focused and likely to arise from a single nucleotide position, initiation from promoters that lack these motifs is much more dispersed (reviewed in [18])."
Distinctions between poised/paused/backtracked/arrested/stalled polII was also made in this review.
Experimental techniques in detecting paused polymerase:
ChIP: "Low spatial resolution, complicating the distinction between engaged Pol II and polymerase in pre-initiation complexes. ChIP may be difficult to perform in samples that are not in a homogeneous suspension, such as tissues."
Permanganate reactivity: "Detects locally melted regions of DNA, including those arising from paused polymerase, by selectively modifying unpaired Thymines within a stable, open transcription bubble. Advantages: Can be performed directly on whole cells or tissues. Achieves essentially nucleotide-level resolution for mapping paused polymerase. Does not require antibodies. Disadvantages: Low throughput (requires Ligation-Mediated (LM) PCR on individual genes - no genome-wide application as of yet). Application is limited to genes where good primers for primer extension and LM PCR can be designed; because of that, permanganate probing in mammalian systems has been markedly less successful than in Drosophila. Since actively elongating polymerase generates very transient melting of DNA at a given location, permanganate is inefficient at detecting productive elongation complexes."
Nuclear run-on: "Detects elongation-competent RNA polymerases through their ability to incorporate a label into nascent RNA. Advantages: Specifically reveals transcriptionally engaged polymerase, with extremely high sensitivity and low background. Adaptable for high-throughput genome-wide applications. Can be used in various organisms. Disadvantages. Requires preparation of nuclei to detect paused polymerase. Resolution for mapping of paused polymerase is reduced by the necessity to allow polymerase to run-on and incorporate labeled nucleotides into RNA."
RNA analysis: "Directly detects short RNA species derived from paused Pol II. Advantages. Sequencing of RNAs from the 3′-end reveals the positions of promoter-proximally paused polymerase at nucleotide-level resolution. Designed for high-throughput genome-wide applications. Does not require antibodies, cell treatment or labeling. Can be used in various organisms. Disadvantages. Requires enzymes available from only one commercial source. Cannot distinguish between RNA species that remain associated with paused Pol II and those that have been released."
p300 and CREB-binding protein (CBP)
Excerpts from "Enhancer function: new insights into the regulation of tissue-specific gene expression"
"CBP (CREB-binding protein) and p300 are highly similar proteins that have histone acetyltransferase activity and contain a variety of functional domains involved in interactions with other transcription factors or histone modifications29. These two proteins interact with the sequence-specific binding transcription factor CREB (Cyclic-AMP Response Element Binding) and have been previously shown to be involved in several cell signaling pathways by activating the transcription of a variety of genes."
"Cell-type specific occupancy of enhancers by CBP/p300 regulates distinct transcriptional programs in many cell types12 and, therefore, these proteins may be a general component of a large class of enhancer elements."
"CBP (CREB-binding protein) and p300 are highly similar proteins that have histone acetyltransferase activity and contain a variety of functional domains involved in interactions with other transcription factors or histone modifications29. These two proteins interact with the sequence-specific binding transcription factor CREB (Cyclic-AMP Response Element Binding) and have been previously shown to be involved in several cell signaling pathways by activating the transcription of a variety of genes."
"Cell-type specific occupancy of enhancers by CBP/p300 regulates distinct transcriptional programs in many cell types12 and, therefore, these proteins may be a general component of a large class of enhancer elements."
Paper: Polyadenylation site–induced decay of upstream transcripts enforces promoter directionality
Excerpts from "Polyadenylation site–induced decay of upstream transcripts enforces promoter directionality"
"Promoter-upstream transcripts are 5′ capped, >100 nucleotides (nt) long and 3′-end adenylated in the absence of exosome activity11."
"Promoter-upstream transcripts are 5′ capped, >100 nucleotides (nt) long and 3′-end adenylated in the absence of exosome activity11."
Thursday, 12 March 2015
Estimation of Gene and Variant Age
Excerpts from "Promoter directionality is controlled by U1 snRNP and polyadenylation signals"
"Previously, mouse protein-coding genes have been assigned to 12 evolutionary branches and dated by analysing the presence or absence of orthologues in the vertebrate phylogeny."
And the following paper was cited for the data source and methodology of gene age estimation in mice.
"Zhang, Y. E., Vibranovski, M. D., Landback, P., Marais, G. A. & Long, M. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS Biol. 8, e1000494 (2010)"
=================================================
Estimating gene age.
Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
=================================================
Estimating age of rare variants
Demography and the Age of Rare Variants
"Previously, mouse protein-coding genes have been assigned to 12 evolutionary branches and dated by analysing the presence or absence of orthologues in the vertebrate phylogeny."
And the following paper was cited for the data source and methodology of gene age estimation in mice.
"Zhang, Y. E., Vibranovski, M. D., Landback, P., Marais, G. A. & Long, M. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS Biol. 8, e1000494 (2010)"
=================================================
Estimating gene age.
Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes
=================================================
Estimating age of rare variants
Demography and the Age of Rare Variants
Active and Divergent Promoters
Excerpts from "Promoter directionality is controlled by U1 snRNP and polyadenylation signals"
"Active promoters were defined as promoters with GRO-seq signal detected within the first 1 kb downstream of the sense strand. A promoter was considered divergent if it contained GRO-seq signal in the first 1 kb downstream of the sense strand and within the first 2 kb of the upstream antisense strand. A minimum number of two reads within the defined window (downstream 1 kb or upstream 2 kb) was used as a cutoff for background signals."
"Active promoters were defined as promoters with GRO-seq signal detected within the first 1 kb downstream of the sense strand. A promoter was considered divergent if it contained GRO-seq signal in the first 1 kb downstream of the sense strand and within the first 2 kb of the upstream antisense strand. A minimum number of two reads within the defined window (downstream 1 kb or upstream 2 kb) was used as a cutoff for background signals."
Cross-Linking Immunoprecipitation (CLIP)-seq
Excerpts from "RNA Immunoprecipitation (RIP) & Cross-Linking Immunoprecipitation (CLIP)"
"Unlike DNA-protein crosslinking which is done with formaldehyde, CLIP uses ultraviolet (UV) light. Unlike formaldehyde, UV crosslinking is irreversible. UV crosslinks are also more specific and only link proteins to RNAs that are in very close proximity. Further, UV crosslinks do not form between two proteins (Brimacombe et al., 1988). For these reasons, UV has become the standard in RNA research. The RNA-RBP complexes are then immunoprecipitated. Due to the irreversibility of the crosslinks, the next step is digestion with a proteinase."
"Unlike DNA-protein crosslinking which is done with formaldehyde, CLIP uses ultraviolet (UV) light. Unlike formaldehyde, UV crosslinking is irreversible. UV crosslinks are also more specific and only link proteins to RNAs that are in very close proximity. Further, UV crosslinks do not form between two proteins (Brimacombe et al., 1988). For these reasons, UV has become the standard in RNA research. The RNA-RBP complexes are then immunoprecipitated. Due to the irreversibility of the crosslinks, the next step is digestion with a proteinase."
Wednesday, 11 March 2015
Tuesday, 10 March 2015
Test Java Version and 64-bit JVM or 32-bit
java -version
java -d64 -version
Saturday, 7 March 2015
Simulating RNA-seq Data
Excerpts from "Systematic evaluation of spliced alignment programs for RNA-seq data"
"Simulated RNA-seq data were generated using the BEERS toolkit (http://cbil.upenn.edu/BEERS/), and additional modeling of base-call errors and quality scores was done with simNGS (http://www.ebi.ac.uk/goldman-srv/simNGS/)."
"Simulated RNA-seq data were generated using the BEERS toolkit (http://cbil.upenn.edu/BEERS/), and additional modeling of base-call errors and quality scores was done with simNGS (http://www.ebi.ac.uk/goldman-srv/simNGS/)."
Thursday, 5 March 2015
Polyadenylation
Useful videos polyadenylation of mRNA (poly a tail) and polyadenylation
CPSF (cleavage and polyadenylation specificity factor) and CstF (cleavage stimulating factor) bind to the polyadenylation signal sequence (AAUAAA) and a GU-rich region of mRNA, respectively. Cleavage factor (an endonuclease) is recruited to cleave mRNA right before the GU-rich region, resulting in the dissociation of CstF. Polyadenylation binding protein is recruited. Pol II starts to synthesize the polyA tail. When ~10 polyA is formed, CPSF disengages. Pol II detaches. PolyA binding protein coats the polyA tail, and mediates the circularisation of polyA tail with the 5'-cap of mRNA.
CPSF (cleavage and polyadenylation specificity factor) and CstF (cleavage stimulating factor) bind to the polyadenylation signal sequence (AAUAAA) and a GU-rich region of mRNA, respectively. Cleavage factor (an endonuclease) is recruited to cleave mRNA right before the GU-rich region, resulting in the dissociation of CstF. Polyadenylation binding protein is recruited. Pol II starts to synthesize the polyA tail. When ~10 polyA is formed, CPSF disengages. Pol II detaches. PolyA binding protein coats the polyA tail, and mediates the circularisation of polyA tail with the 5'-cap of mRNA.
Promoter Directionality
Excerpts from Promoter directionality is controlled by U1 snRNP and polyadenylation signals
"Two potential mechanisms for suppressing transcription elongation in the upstream antisense region of gene transcription start sites (TSSs) include inefficient release of paused RNAPII and/or early termination of transcription. RNAPII pauses shortly after initiation downstream of the gene TSS and the paused state is released by the recruitment and activity of the positive transcription elongation factor b (P-TEFb)5. A detailed characterization of several upstream antisense RNAs (uaRNAs) in mouse embryonic stem cells (ESCs) suggested that P-TEFb is recruited similarly in both sense and antisense directions6, and in human cells, elongating RNAPII (phosphorylated at Ser 2 in the carboxy-terminal domain) occupies the proximal upstream transcribed region7. These data suggest that the upstream antisense RNAPII complex undergoes the initial phase of elongation but probably terminates early owing to an unknown mechanism."
Polyadenylation [poly(A)] signals: PAS.
PAS is expected to be located about 22 nucleotides upstream of the cleavage site.
"Two potential mechanisms for suppressing transcription elongation in the upstream antisense region of gene transcription start sites (TSSs) include inefficient release of paused RNAPII and/or early termination of transcription. RNAPII pauses shortly after initiation downstream of the gene TSS and the paused state is released by the recruitment and activity of the positive transcription elongation factor b (P-TEFb)5. A detailed characterization of several upstream antisense RNAs (uaRNAs) in mouse embryonic stem cells (ESCs) suggested that P-TEFb is recruited similarly in both sense and antisense directions6, and in human cells, elongating RNAPII (phosphorylated at Ser 2 in the carboxy-terminal domain) occupies the proximal upstream transcribed region7. These data suggest that the upstream antisense RNAPII complex undergoes the initial phase of elongation but probably terminates early owing to an unknown mechanism."
Polyadenylation [poly(A)] signals: PAS.
PAS is expected to be located about 22 nucleotides upstream of the cleavage site.
RNA-directed Transcriptional Gene Silencing
Excerpts from Wiki
"RNA-directed DNA methylation (RdDM) is an epigenetic process first discovered in plants (Wassenegger et al, 1994, Cell, Vol 76, 567-576). Double-stranded RNAs (dsRNAs) are processed to 21-24 nucleotide small interfering RNAs (siRNAs) and guide methylation of homologous DNA loci."
"Besides RNA molecules, a plethora of proteins are involved in the establishment of RNA-directed DNA methylation, like Argonautes, DNA methyltransferases, chromatin remodelling complexes and the plant-specific PolIV and PolV. All these act in concert to add a methyl-group at the 5' position of cytosines. In contrast to animals, cytosines at all sequence context (CG, CHG, CHH) may get de novo methylated in plants."
"RNA-directed DNA methylation (RdDM) is an epigenetic process first discovered in plants (Wassenegger et al, 1994, Cell, Vol 76, 567-576). Double-stranded RNAs (dsRNAs) are processed to 21-24 nucleotide small interfering RNAs (siRNAs) and guide methylation of homologous DNA loci."
"Besides RNA molecules, a plethora of proteins are involved in the establishment of RNA-directed DNA methylation, like Argonautes, DNA methyltransferases, chromatin remodelling complexes and the plant-specific PolIV and PolV. All these act in concert to add a methyl-group at the 5' position of cytosines. In contrast to animals, cytosines at all sequence context (CG, CHG, CHH) may get de novo methylated in plants."
Tuesday, 3 March 2015
Antisense Transcription
Excerpts from Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters
"Transcription of coding and noncoding RNA molecules by eukaryotic RNA polymerases requires their collaboration with hundreds of transcription factors to direct and control polymerase recruitment, initiation, elongation, and termination."
"transcriptionally engaged Pol II that has accumulated between ∼20 and 50 bases downstream of transcription start sites (TSSs) (5, 6), indicating that transcription can be regulated at the stage of elongation as well as the recruitment and initiation stages."
"This promoter-proximal pausing or stalling (8) is proposed to be an important post-initiation, rate-limiting target for gene regulation."
"A global run-on-sequencing (GRO-seq) assay: to map and quantify transcriptionally engaged polymerase density genome-wide. These measurements provide a snapshot of genome-wide transcription and directly evaluate promoter-proximal pausing on all genes. We used nuclear run-on assays (NRO) to extend nascent RNAs that are associated with transcriptionally engaged polymerases under conditions where new initiation is prohibited."
===============================================
Excerpts from "Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters" on GRO-seq (detailed in Figure S1: http://www.sciencemag.org/content/suppl/2008/12/04/1162228.DC1/Core.SOM.pdf)
5’-7meG is explained in Five-prime cap as "in eukaryotes, the 5′ cap (cap-0), found on the 5′ end of an mRNA molecule, consists of a guanine nucleotide connected to mRNA via an unusual 5′ to 5′ triphosphate linkage. This guanosine is methylated on the 7 position directly after capping in vivo by a methyltransferase. It is referred to as a 7-methylguanylate cap, abbreviated m7G or 5'-7meG".
===============================================
Nuclear run-on experiments explained in Nuclear Run‐on Assays
"The nuclear run‐on assay is used to measure the transcriptional activity of selected endogenous genes. Nuclei are isolated from appropriate cells using techniques that keep engaged RNA polymerase complexes bound to genomic DNA. Subsequent incubation with the four ribonucleotide triphosphates, one of which is radiolabelled, allows polymerase complexes to progress several hundred base pairs along the genome, producing short, radioactive RNA molecules. After purification, the RNA species of interest are detected by hybridization to appropriate DNA sequences immobilized on a membrane. The amount of specifically hybridized RNA is proportional to the number of engaged polymerase complexes and therefore reflects the transcriptional activity of the gene in the intact cell."
" Divergent transcription is a mark for most active promoters."
"Active promoters are typically marked by histone modifications such as di- and trimethylation of H3-Lys4 (H3K4me2 and H3K4me3) as well as acetylation of histone H3 and H4 (H3ac and H4ac). These modifications show a bimodal distribution around TSSs, with the trough representing a nucleosome-free region encompassing the TSS."
"The majority of promoters experience initiation in the upstream direction but that these divergent polymerases do not productively elongate transcripts."
================================================
Global nuclear run-on experiments are explained in Wiki.
"GRO-seq involves the labeling of newly synthesized transcripts with bromouridine (BrU). Cells or nuclei are incubated with BrUTP in the presence of Sarkoysl which prevents the attachment of RNA polymerase to the DNA. Therefore only RNA polymerase that are already on the DNA before the addition of Sarkosyl will produce new transcripts that will be labeled with BrU. The labeled transcripts are captured with anti-BrdU antibody labeled magnetic beads, converted to cDNAs and then sequenced by Next Generation DNA sequencing. The sequencing reads are then aligned to the genome and number of reads per transcript provide an accurate estimate of the number of transcripts synthesized."
================================================
Cryptic promoters are promoters situated within genes, according to Gene regulation by antisense transcription.
================================================
Excerpts from Wiki
"Bidirectional promoters are short (<1 kbp), intergenic regions of DNA between the 5' ends of the genes in a bidirectional gene pair."
================================================
As in Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters, promoters that produce mRNAs in only one direction are referred as a class of divergent promoters. So a subclass of bidirectional promoters are divergent promoters.
"Transcription of coding and noncoding RNA molecules by eukaryotic RNA polymerases requires their collaboration with hundreds of transcription factors to direct and control polymerase recruitment, initiation, elongation, and termination."
"transcriptionally engaged Pol II that has accumulated between ∼20 and 50 bases downstream of transcription start sites (TSSs) (5, 6), indicating that transcription can be regulated at the stage of elongation as well as the recruitment and initiation stages."
"This promoter-proximal pausing or stalling (8) is proposed to be an important post-initiation, rate-limiting target for gene regulation."
"A global run-on-sequencing (GRO-seq) assay: to map and quantify transcriptionally engaged polymerase density genome-wide. These measurements provide a snapshot of genome-wide transcription and directly evaluate promoter-proximal pausing on all genes. We used nuclear run-on assays (NRO) to extend nascent RNAs that are associated with transcriptionally engaged polymerases under conditions where new initiation is prohibited."
===============================================
Excerpts from "Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters" on GRO-seq (detailed in Figure S1: http://www.sciencemag.org/content/suppl/2008/12/04/1162228.DC1/Core.SOM.pdf)
5’-7meG is explained in Five-prime cap as "in eukaryotes, the 5′ cap (cap-0), found on the 5′ end of an mRNA molecule, consists of a guanine nucleotide connected to mRNA via an unusual 5′ to 5′ triphosphate linkage. This guanosine is methylated on the 7 position directly after capping in vivo by a methyltransferase. It is referred to as a 7-methylguanylate cap, abbreviated m7G or 5'-7meG".
===============================================
Nuclear run-on experiments explained in Nuclear Run‐on Assays
"The nuclear run‐on assay is used to measure the transcriptional activity of selected endogenous genes. Nuclei are isolated from appropriate cells using techniques that keep engaged RNA polymerase complexes bound to genomic DNA. Subsequent incubation with the four ribonucleotide triphosphates, one of which is radiolabelled, allows polymerase complexes to progress several hundred base pairs along the genome, producing short, radioactive RNA molecules. After purification, the RNA species of interest are detected by hybridization to appropriate DNA sequences immobilized on a membrane. The amount of specifically hybridized RNA is proportional to the number of engaged polymerase complexes and therefore reflects the transcriptional activity of the gene in the intact cell."
"Active promoters are typically marked by histone modifications such as di- and trimethylation of H3-Lys4 (H3K4me2 and H3K4me3) as well as acetylation of histone H3 and H4 (H3ac and H4ac). These modifications show a bimodal distribution around TSSs, with the trough representing a nucleosome-free region encompassing the TSS."
"The majority of promoters experience initiation in the upstream direction but that these divergent polymerases do not productively elongate transcripts."
================================================
Global nuclear run-on experiments are explained in Wiki.
"GRO-seq involves the labeling of newly synthesized transcripts with bromouridine (BrU). Cells or nuclei are incubated with BrUTP in the presence of Sarkoysl which prevents the attachment of RNA polymerase to the DNA. Therefore only RNA polymerase that are already on the DNA before the addition of Sarkosyl will produce new transcripts that will be labeled with BrU. The labeled transcripts are captured with anti-BrdU antibody labeled magnetic beads, converted to cDNAs and then sequenced by Next Generation DNA sequencing. The sequencing reads are then aligned to the genome and number of reads per transcript provide an accurate estimate of the number of transcripts synthesized."
================================================
Cryptic promoters are promoters situated within genes, according to Gene regulation by antisense transcription.
================================================
Excerpts from Wiki
"Bidirectional promoters are short (<1 kbp), intergenic regions of DNA between the 5' ends of the genes in a bidirectional gene pair."
================================================
As in Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters, promoters that produce mRNAs in only one direction are referred as a class of divergent promoters. So a subclass of bidirectional promoters are divergent promoters.
Monday, 2 March 2015
Sunday, 1 March 2015
MRE-Seq, MeDIP, MBD-seq, RRBS and MethylC-seq/BS-seq/WGBS
MeDIP (Methylated DNA immunoprecipitation), methyl-CpG binding domain (MBD) protein-enriched genome sequencing (MBD-seq) and MethylC-seq/BS-seq/WGBS (bisulfite sequencing) are reviewed in Mapping Human Epigenomes
"The DNA methylation toolkit includes three main molecular-biology-based techniques: digestion of genomic DNA with methyl-sensitive restriction enzymes, affinity-based enrichment of methylated DNA fragments, and chemical conversion methods."
"Affinity-based enrichment assays capture methylated fragments from sonicated DNA with an antibody (MeDIP-seq) or a methyl-binding domain (MBD-seq) (Down et al., 2008 and Serre et al., 2010). When sequencing enriched DNA fragments, at least one cytosine is certainly methylated but the exact site or combination of sites can not be directly determined. Therefore, the resolution of affinity-based assays is highly dependent on the DNA fragment size, CpG density, and immunoprecipitation quality of the reagent. The results of both restriction-enzyme- and affinity-based sequencing methods are qualitative levels of enrichment rather than absolute."
"Bisulfite sequencing is a chemical conversion method that directly determines the methylation state of each cytosine in a binary fashion and is widely accepted as a gold standard for mapping DNA methylation. Treatment of genomic DNA with sodium bisulfite chemically converts unmethylated cytosines to uracil. After PCR, and assuming nearly complete bisulfite conversion, all unmethylated cytosines become thymidines and remaining cytosines correspond to 5mC."
"The DNA methylation toolkit includes three main molecular-biology-based techniques: digestion of genomic DNA with methyl-sensitive restriction enzymes, affinity-based enrichment of methylated DNA fragments, and chemical conversion methods."
DNase I Hypersensitive Sites (DHS)
Excerpts from The accessible chromatin landscape of the human genome
"DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions."
"DHSs are flanked by nucleosomes, which may acquire histone modification patterns that reflect the functional role of the adjoining regulatory DNA, such as the association of histone H3 lysine 4 trimethylation (H3K4me3) with promoter elements."
"DNase I sensitivity genome-wide using massively parallel sequencing7, 8, 9 in a total of 125 human cell and tissue types including normal differentiated primary cells (n = 71), immortalized primary cells (n = 16), malignancy-derived cell lines (n = 30) and multipotent and pluripotent progenitor cells (n = 8) (Supplementary Table 1)."
Cell types of interests include HESC H1 Human Embryonic Stem Cells, fibroblasts of foetal and adult tissues, H7-hESC Undifferentiated human embryonic stem cells, human renal epithelial cells, H9ES human embryonic stem cell (hESC) H9, iPS induced pluripotent stem cell derived from skin fibroblast, Dedifferentiated human pancreatic islets from one of the sources for PanIslets, human pancreatic islets.
DNase I hypersensitivity site sequencing protocol is demonstrated in DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells
"DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions."
"DHSs are flanked by nucleosomes, which may acquire histone modification patterns that reflect the functional role of the adjoining regulatory DNA, such as the association of histone H3 lysine 4 trimethylation (H3K4me3) with promoter elements."
"DNase I sensitivity genome-wide using massively parallel sequencing7, 8, 9 in a total of 125 human cell and tissue types including normal differentiated primary cells (n = 71), immortalized primary cells (n = 16), malignancy-derived cell lines (n = 30) and multipotent and pluripotent progenitor cells (n = 8) (Supplementary Table 1)."
Cell types of interests include HESC H1 Human Embryonic Stem Cells, fibroblasts of foetal and adult tissues, H7-hESC Undifferentiated human embryonic stem cells, human renal epithelial cells, H9ES human embryonic stem cell (hESC) H9, iPS induced pluripotent stem cell derived from skin fibroblast, Dedifferentiated human pancreatic islets from one of the sources for PanIslets, human pancreatic islets.
DNase I hypersensitivity site sequencing protocol is demonstrated in DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells
Subscribe to:
Posts (Atom)