Sunday, 25 December 2016
Friday, 23 December 2016
Thursday, 22 December 2016
Modifiy Python matplotlib Backend Settings
To find the location of configuration file:
>>> import matplotlib
>>> matplotlib.matplotlib_fname()
Edit the matplotlib configuration file:
Modify "backend : tkagg" to "backend : Agg", for example.
More of Customizing matplotlib.
>>> import matplotlib
>>> matplotlib.matplotlib_fname()
Edit the matplotlib configuration file:
Modify "backend : tkagg" to "backend : Agg", for example.
More of Customizing matplotlib.
Wednesday, 21 December 2016
Friday, 16 December 2016
Wednesday, 14 December 2016
Tuesday, 6 December 2016
Bash Sponge Command
Combining many files columnwise, use first column only once
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cited from Can bash wildcards specify negative matches?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cited from Can bash wildcards specify negative matches?
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the following
sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
@(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Monday, 14 November 2016
An Internal Ribosome Entry Site (IRES)
An Internal Ribosome Entry Site (IRES)
Thursday, 10 November 2016
S-adenosyl-L-methionine (SAMe)
Cited from SAMe Safety
"SAMe is likely safe when taken by mouth in doses of 400-600 milligrams daily for up to two years; when taken by mouth at doses of 800-1,600 milligrams daily for up to 42 days; and when given through IV in doses up to 800 milligrams daily for up to 21 days."
"SAMe is likely safe when taken by mouth in doses of 400-600 milligrams daily for up to two years; when taken by mouth at doses of 800-1,600 milligrams daily for up to 42 days; and when given through IV in doses up to 800 milligrams daily for up to 21 days."
Read Command
Cited from Getting User Input Via Keyboard
While Read Loop
Cited from For and Read-While Loops in Bash
while read line
while read line
or
while IFS= read -r field1 filed2 field3 ... fieldN
while read line
do
echo "$line"
done < list-of-dirs.txt
or
while read line
do
echo "$line"
do
command1 on $line
command2 on $line
..
....
commandN
done < "/path/to/filename"
while IFS= read -r field1 filed2 field3 ... fieldN
do
command1 on $field1
command2 on $field1 and $field3
..
....
commandN on $field1 ... $fieldN
done < "/path/to dir/file name with space"
"IFS is used to set field separator (default is while space). The -r option to read command disables backslash escaping (e.g., \n, \t). This is failsafe while read loop for reading text files."
Differences Between $@ and $* as Positional Parameters
Cited from $IFS
- $@ expanded as "$1" "$2" "$3" ... "$n"
- $* expanded as "$1y$2y$3y...$n", where y is the value of IFS variable i.e. "$*" is one long string and $IFS act as an separator or token delimiters.
IFS
Cited from Bash: Show IFS value
To show IFS value,
printf %q "$IFS"
Where,
To show IFS value,
printf %q "$IFS"
What is the meaning of IFS=$'\n' in bash scripting?
Cited from Getting User Input Via Keyboard
cat -etv <<<"$IFS"
Sample outputs:
^I$
Cited from Getting User Input Via Keyboard
cat -etv <<<"$IFS"
Sample outputs:
^I$
$
- $ - end of line i.e. newline
- ^I$ - tab and newline
Arguments for "Format"
Cited from The printf command
%f Interpret and print the associated argument as floating point number
%e Interpret the associated argument as double, and print it in <N>±e<N> format
%g Interprets the associated argument as double, but prints it like %f or %e
%f Interpret and print the associated argument as floating point number
%e Interpret the associated argument as double, and print it in <N>±e<N> format
%g Interprets the associated argument as double, but prints it like %f or %e
Tuesday, 25 October 2016
Friday, 21 October 2016
RNA-seq 5-prime to 3-prime Bias
Source of 3-prime Bias in PolyA-enriched RNA-seq
Cited from Figure 8: The use of 3′ bias as a quality control assay for cDNA.
======================================
Cited from NGS Quality Control in RNA Sequencing- Some Free Tools
To visualise 5-prime or 3-prime bias, use the tools of Picard or RSeQC.
======================================
Cited from How to understand median 5 prime to 3 prime bias ratio from Picard?
It is up to the analysis software to deal with 5-prime or 3-prime bias.
======================================
Cited from Salmon Doc
"--seqBias" to learn sequence bias
Salmon uses a variable-length Markov Model (VLMM) to model the sequence specific biases at both the 5’ and 3’ end of sequenced fragments. This methodology generally follows that of Roberts et al. [2], though some details of the VLMM differ.
Cited from Figure 8: The use of 3′ bias as a quality control assay for cDNA.
"Total (bulk) RNA derived from tissue is confirmed to have a high RIN score before isolation of nuclei. Partial degradation of the RNA might occur during the preparation of nuclei by Dounce homogenization (nuclei prep) or FACS of the individual nuclei. If the mRNA is degraded by hydrolysis, shearing or RNases, truncated mRNA species could be created, and those containing the polyA sequence at the 3′ end of the transcripts might produce cDNA. This would generate greater RNA-seq coverage of the 3′ end of transcripts (3′-bias) compared with the high-quality bulk RNA."
======================================
Cited from NGS Quality Control in RNA Sequencing- Some Free Tools
To visualise 5-prime or 3-prime bias, use the tools of Picard or RSeQC.
======================================
Cited from How to understand median 5 prime to 3 prime bias ratio from Picard?
It is up to the analysis software to deal with 5-prime or 3-prime bias.
======================================
Cited from Salmon Doc
"--seqBias" to learn sequence bias
Salmon uses a variable-length Markov Model (VLMM) to model the sequence specific biases at both the 5’ and 3’ end of sequenced fragments. This methodology generally follows that of Roberts et al. [2], though some details of the VLMM differ.
Bash Parameter Substitution
Cited from How to tell if a string is not defined in a bash shell script?
- ${var+blahblah}: if var is defined, 'blahblah' is substituted for the expression, else null is substituted
- ${var-blahblah}: if var is defined, it is itself substituted, else 'blahblah' is substituted
- ${var?blahblah}: if var is defined, it is substituted, else the function exists with 'blahblah' as an error message.
Tuesday, 18 October 2016
Donwload SRA Files and Conversion to FASTQ Files
Thursday, 13 October 2016
Btrim: Fast Adapter Triming Tool
Cited from Btrim: A fast, lightweight adapter and quality trimming program for next-generation sequencing technologies
"A typical trimming of 30M reads with two sets of adapter pairs can be done in about a minute with a small memory footprint."
Btrim Readme
"A typical trimming of 30M reads with two sets of adapter pairs can be done in about a minute with a small memory footprint."
Btrim Readme
Tuesday, 11 October 2016
PRO-seq and GRO-seq Data Analyses
PRO-seq Protocol
GRO-seq Data Analysis: Global analysis of transcription in castration-resistant prostate cancer cells uncovers active enhancers and direct androgen receptor targets
PRO-seq Data Analysis: Divergence of a conserved elongation factor and transcription regulation in budding and fission yeast
GRO-seq Data Analysis Chapter
Omics Tools: GRO-seq Analysis Tools
GRO-seq Data Analysis: Global analysis of transcription in castration-resistant prostate cancer cells uncovers active enhancers and direct androgen receptor targets
PRO-seq Data Analysis: Divergence of a conserved elongation factor and transcription regulation in budding and fission yeast
GRO-seq Data Analysis Chapter
Omics Tools: GRO-seq Analysis Tools
Tuesday, 27 September 2016
Wednesday, 7 September 2016
Coefficient of Variation as a Measure for Transcriptional Stability
"To measure transcriptional stability, we computed the coefficient of variation for gene expression over 12 developmental time points."
Tuesday, 6 September 2016
MS2 Tagging
Cited from In Vivo RNA Visualization in Plants Using MS2 Tagging
"This technique involves the tagging of the RNA of interest with repeats of an RNA stem-loop (SL) that is derived from the origin of assembly of the bacteriophage MS2 and recruits the MS2 coat protein (MCP). Thus, expression of MCP fused to a fluorescent marker allows the specific visualization of the SL-carrying RNA."
"This technique involves the tagging of the RNA of interest with repeats of an RNA stem-loop (SL) that is derived from the origin of assembly of the bacteriophage MS2 and recruits the MS2 coat protein (MCP). Thus, expression of MCP fused to a fluorescent marker allows the specific visualization of the SL-carrying RNA."
Saturday, 3 September 2016
Condensins, Topoisomerases and Cohesion
Cited from the review "Chromosome
Condensation and Cohesion"
"Early research on chromosome structure demonstrated the
existence of a nonhistone protein scaffold that runs along the chromatids. This scaffold is composed mainly of two proteins topoisomerase IIa and the condensin subunit SMC2."
"Topoisomerases modify the topology of DNA by transiently introducing nicks in a single strand to allow relaxation of supercoils (topoisomerases I and III) or breaking a double strand to allow passage of another DNA duplex through the opening (topoisomerase II). The latter reaction allows the catenation or decatenation of two DNA molecules and is essential for chromosome individualisation and condensation, as well as for sister-chromatid resolution and segregation."
"Condensins contain two Structural Maintenance of Chromosomes (SMC) proteins, SMC2 and SMC4. They form long coiled-coil rods joined by a hinge region and containing an adenosine triphosphatase (ATPase) head at the free end."
"Eukaryotic SMCs are found in three types of complexes. Condensins consisting of the SMC2/4 heterodimer are involved in chromosome condensation. The cohesin complex containing SMC1/3 mediates sister-chromatid cohesion. The SMC5/6 complex is involved in DNA repair and telomere maintenance."
"Two forms of condensin complexes exist: condensins I and II. Both complexes are pentamers that contain SMC2 and SMC4, but differ in their non-SMC subunits. Condensin I contains the non-SMC subunits CAP-D2, CAP-G and CAP-H, whereas condensin II contains CAP-D3, CAP-G2 and CAP-H2."
"These two complexes might play different roles in the condensation process, since depletion of non-SMC subunits of condensin I results in ‘puffed’ chromosomes while depletion of those in condensin II leads to ‘curly’ chromosomes."
"It is generally accepted that both condensin and topoisomerase IIa (Topo IIa) are important for chromosome condensation."
"Both Topo IIa and condensin associate with chromosomes in late G2 primarily at centromeres.Topo IIa decorates the chromosome scaffold during prophase, but condensin enrichment occurs later, in prometaphase."
"Condensin II is present in the nucleus during interphase
while condensin I is cytoplasmic and comes into contact
with chromosomes only after nuclear envelope breakdown.
Selective depletion of condensin II, but not condensin I,
by depleting their non-SMC subunits showed delayed
chromosome condensation in prophase."
"Interestingly, cells depleted of both condensins I and II were still able to condense their chromosomes."
"Maintenance of chromosome condensation, therefore, seems to rely more on condensin."
"Cohesin binds to chromatin during early G1 and before DNA replication, however, suggesting that cohesin binding to chromatin does not equal sister-chromatid cohesion."
"Approximately 90% of cohesin can be depleted from human cells without substantial defects in sister chromatid cohesion."
"In higher eukaryotes, cohesin removal occurs through two pathways. In the prophase pathway, Plk1- mediated phosphorylation of SA1/2 triggers its removal by the Wapl–Pds5 complex. This pathway removes most cohesin from the chromosome arms. Centromeric cohesin is protected from the prophase pathway by the Sgo1–PP2A complex. In the metaphase pathway, the Scc1 subunit of the centromeric pool of cohesin is cleaved by separase to allow anaphase onset."
"Centromeric cohesion is actively protected during mitosis by two mechanisms: protection against cohesin removal by the Sgo1–PP2A complex and inhibition of separase activation by the spindle checkpoint."
"Thus, Topo IIa is required not only for chromosome individualisation and condensation during early mitosis, but also for sister chromatid separation during anaphase."
"Cleavage of centromeric cohesin by separase promotes DNA decatenation by Topo IIa, presumably because cohesin removal increases the access of Topo IIa to catenated DNA."
"Early research on chromosome structure demonstrated the
existence of a nonhistone protein scaffold that runs along the chromatids. This scaffold is composed mainly of two proteins topoisomerase IIa and the condensin subunit SMC2."
"Topoisomerases modify the topology of DNA by transiently introducing nicks in a single strand to allow relaxation of supercoils (topoisomerases I and III) or breaking a double strand to allow passage of another DNA duplex through the opening (topoisomerase II). The latter reaction allows the catenation or decatenation of two DNA molecules and is essential for chromosome individualisation and condensation, as well as for sister-chromatid resolution and segregation."
"Condensins contain two Structural Maintenance of Chromosomes (SMC) proteins, SMC2 and SMC4. They form long coiled-coil rods joined by a hinge region and containing an adenosine triphosphatase (ATPase) head at the free end."
"Eukaryotic SMCs are found in three types of complexes. Condensins consisting of the SMC2/4 heterodimer are involved in chromosome condensation. The cohesin complex containing SMC1/3 mediates sister-chromatid cohesion. The SMC5/6 complex is involved in DNA repair and telomere maintenance."
"Two forms of condensin complexes exist: condensins I and II. Both complexes are pentamers that contain SMC2 and SMC4, but differ in their non-SMC subunits. Condensin I contains the non-SMC subunits CAP-D2, CAP-G and CAP-H, whereas condensin II contains CAP-D3, CAP-G2 and CAP-H2."
"These two complexes might play different roles in the condensation process, since depletion of non-SMC subunits of condensin I results in ‘puffed’ chromosomes while depletion of those in condensin II leads to ‘curly’ chromosomes."
"It is generally accepted that both condensin and topoisomerase IIa (Topo IIa) are important for chromosome condensation."
"Both Topo IIa and condensin associate with chromosomes in late G2 primarily at centromeres.Topo IIa decorates the chromosome scaffold during prophase, but condensin enrichment occurs later, in prometaphase."
"Condensin II is present in the nucleus during interphase
while condensin I is cytoplasmic and comes into contact
with chromosomes only after nuclear envelope breakdown.
Selective depletion of condensin II, but not condensin I,
by depleting their non-SMC subunits showed delayed
chromosome condensation in prophase."
"Interestingly, cells depleted of both condensins I and II were still able to condense their chromosomes."
"Maintenance of chromosome condensation, therefore, seems to rely more on condensin."
"Cohesin binds to chromatin during early G1 and before DNA replication, however, suggesting that cohesin binding to chromatin does not equal sister-chromatid cohesion."
"Approximately 90% of cohesin can be depleted from human cells without substantial defects in sister chromatid cohesion."
"In higher eukaryotes, cohesin removal occurs through two pathways. In the prophase pathway, Plk1- mediated phosphorylation of SA1/2 triggers its removal by the Wapl–Pds5 complex. This pathway removes most cohesin from the chromosome arms. Centromeric cohesin is protected from the prophase pathway by the Sgo1–PP2A complex. In the metaphase pathway, the Scc1 subunit of the centromeric pool of cohesin is cleaved by separase to allow anaphase onset."
"Centromeric cohesion is actively protected during mitosis by two mechanisms: protection against cohesin removal by the Sgo1–PP2A complex and inhibition of separase activation by the spindle checkpoint."
"Thus, Topo IIa is required not only for chromosome individualisation and condensation during early mitosis, but also for sister chromatid separation during anaphase."
"Cleavage of centromeric cohesin by separase promotes DNA decatenation by Topo IIa, presumably because cohesin removal increases the access of Topo IIa to catenated DNA."
Mitosis-specific Histone Modifications
Cited from the review "Chromosome
Condensation and Cohesion"
"H3-S10 phosphorylation is mediated by the Aurora B kinase.It initiates at centromeres in late G2 and extends to the whole chromosome by early mitosis."
"Phosphorylation of H3-S10 dissociates chromatin-bound proteins such as heterochromatin protein 1 (HP1) and splicing factors SRp20 and ASF/SF2 during mitosis, suggesting that this phosphorylation event might contribute to chromosome condensation by removing chromatin-bound proteins ."
"In addition to histone H3, the linker histone H1 is also
heavily phosphorylated during mitosis, which has been
implicated in chromosome condensation."
"H3-S10 phosphorylation is mediated by the Aurora B kinase.It initiates at centromeres in late G2 and extends to the whole chromosome by early mitosis."
"Phosphorylation of H3-S10 dissociates chromatin-bound proteins such as heterochromatin protein 1 (HP1) and splicing factors SRp20 and ASF/SF2 during mitosis, suggesting that this phosphorylation event might contribute to chromosome condensation by removing chromatin-bound proteins ."
"In addition to histone H3, the linker histone H1 is also
heavily phosphorylated during mitosis, which has been
implicated in chromosome condensation."
Wednesday, 31 August 2016
Tuesday, 23 August 2016
Co-localization of Interval Sets in ChIP-seq
ColoWeb: a resource for analysis of colocalization of genomic features
LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor
Cited from COPS: Detecting Co-Occurrence and Spatial Arrangement of Transcription Factor Binding Motifs in Genome-Wide Datasets
"In order to compare the in vivo overlap with the expected (background) overlap, an overlap analysis was performed for the frequent motif patterns for which genome-wide data was available. The expected overlap was measured by randomly permuting (1000 times) the same number of regions bound by one TF through the genome and the mean overlap was subsequently calculated. The significance of the observed compared to the expected overlap was calculated by assuming that the overlap follows a Poisson distribution."
CGATOxford Tools
LOLA: enrichment analysis for genomic region sets and regulatory elements in R and Bioconductor
Cited from COPS: Detecting Co-Occurrence and Spatial Arrangement of Transcription Factor Binding Motifs in Genome-Wide Datasets
"In order to compare the in vivo overlap with the expected (background) overlap, an overlap analysis was performed for the frequent motif patterns for which genome-wide data was available. The expected overlap was measured by randomly permuting (1000 times) the same number of regions bound by one TF through the genome and the mean overlap was subsequently calculated. The significance of the observed compared to the expected overlap was calculated by assuming that the overlap follows a Poisson distribution."
CGATOxford Tools
Thursday, 18 August 2016
Tuesday, 2 August 2016
Monday, 1 August 2016
HaloTagging Protein for Purifcation,Interactions and Imaging
Cited from HaloTag Technology for Protein Purification, Protein Interactions and Imaging
- The HaloTag protein can be fused with protein of interest.
- A family of HaloTag ligands come with different functionalities.
- Ligands consist of two parts: a reactive linker and a functional group, such as fluorescent dye or biotin.
- Binding of the ligand to the HaloTag protein is rapid and irreversible.
- The HaloTag protein is genetically modified hydrolase that covalently binds hydrolase substrate like the HaloTag ligands.
Chromatin Digestion by Micrococcal Nuclease
Cited from the paper "Assays of nucleosome assembly and the inhibition of histone acetyltransferase activity. (11) Digestion of chromatin; and (12) Purification and characterization of DNA after digestion of chromatin"
"Digestion of chromatin by micrococcal nuclease (MNase) provides a relatively simple method for obtaining information about the locations of nucleosomes along DNAstrands. When nuclei in permeabilized cells are exposed to MNase in the presence of a divalent cation, the enzyme makes double-stranded cuts between nucleosomes. Treatment of chromatin substrates with very high concentrations of MNase yields mononucleosome-length DNA prodominantly, while lower concentrations of the enzyme generate one double-stranded cut at intervals of 10 to 50 nucleosomes, depending on the concentration of the enzyme and the substrate. MNase can also make single-stranded DNA cuts at the sites of histone octamers, and, thus, attempts to map the positions of nucleosomes are usually performed with native double-stranded DNA."
"Digestion of chromatin by micrococcal nuclease (MNase) provides a relatively simple method for obtaining information about the locations of nucleosomes along DNAstrands. When nuclei in permeabilized cells are exposed to MNase in the presence of a divalent cation, the enzyme makes double-stranded cuts between nucleosomes. Treatment of chromatin substrates with very high concentrations of MNase yields mononucleosome-length DNA prodominantly, while lower concentrations of the enzyme generate one double-stranded cut at intervals of 10 to 50 nucleosomes, depending on the concentration of the enzyme and the substrate. MNase can also make single-stranded DNA cuts at the sites of histone octamers, and, thus, attempts to map the positions of nucleosomes are usually performed with native double-stranded DNA."
Cell Line: G1E ER4
Cited from the Paper "Tissue-Specific Mitotic Bookmarking
by Hematopoietic Transcription Factor GATA1"
"To monitor GATA1 localization on a global scale in living, unsynchronized erythroid cells, GATA1-YFP fusion constructs were stably introduced into G1E cells."
"G1E cells are erythroid precursors that lack GATA1 and consequently fail to mature (Weiss et al., 1997). Introduction of
a conditional form of GATA1 (GATA1 fused to the ligand binding
domain of the estrogen receptor [ER]) conveys estradiol (E2)-
dependent erythroid maturation in a manner faithfully reproducing that of normal erythroid cells."
"GATA1-ER target gene occupancy and expression closely match that of endogenous GATA1 in primary erythroblasts, providing a physiological assay for GATA1 function. Both N-terminal and C-terminal YFP fusions of GATA1-ER were
generated to account for potential effects of YFP on GATA1-ER
function. YFP-GATA1-ER and GATA1-ER-YFP were expressed
at levels similar to endogenous GATA1 and were equally capable of inducing erythroid differentiation when compared to wild-type GATA1."
by Hematopoietic Transcription Factor GATA1"
"To monitor GATA1 localization on a global scale in living, unsynchronized erythroid cells, GATA1-YFP fusion constructs were stably introduced into G1E cells."
"G1E cells are erythroid precursors that lack GATA1 and consequently fail to mature (Weiss et al., 1997). Introduction of
a conditional form of GATA1 (GATA1 fused to the ligand binding
domain of the estrogen receptor [ER]) conveys estradiol (E2)-
dependent erythroid maturation in a manner faithfully reproducing that of normal erythroid cells."
"GATA1-ER target gene occupancy and expression closely match that of endogenous GATA1 in primary erythroblasts, providing a physiological assay for GATA1 function. Both N-terminal and C-terminal YFP fusions of GATA1-ER were
generated to account for potential effects of YFP on GATA1-ER
function. YFP-GATA1-ER and GATA1-ER-YFP were expressed
at levels similar to endogenous GATA1 and were equally capable of inducing erythroid differentiation when compared to wild-type GATA1."
Tuesday, 26 July 2016
Generate Background Sequences Using Markov Model
Cited from GimmeMotifs documentation
"
generate_background_sequences.py
Generate random sequences according to one of two methods: random or matched_genomic. With the argument type set to random, and an input file in FASTA format, this script will generate sequences with the same dinucleotide distribution as the input sequences according to a 1st order Markov model trained on the input sequences. The -n options is set to 10 by default. The length distribution of the sequences in the output file will be similar as the inputfile. The Markov model can be changed with option -m. If the type is specified as matched_genomic the inputfile needs to be in BED format, and the script will select genomic regions with a similar distribution relative to the transcription start of genes as the input file. Make sure to select the correct genome. The length of the sequences in the output file will be set to the median of the features in the input file.
"
What Is The Appropriate Order For A Background Model In Motif Searches?
Implemention of fasta-get-markov on GUI
"
generate_background_sequences.py
Generate random sequences according to one of two methods: random or matched_genomic. With the argument type set to random, and an input file in FASTA format, this script will generate sequences with the same dinucleotide distribution as the input sequences according to a 1st order Markov model trained on the input sequences. The -n options is set to 10 by default. The length distribution of the sequences in the output file will be similar as the inputfile. The Markov model can be changed with option -m. If the type is specified as matched_genomic the inputfile needs to be in BED format, and the script will select genomic regions with a similar distribution relative to the transcription start of genes as the input file. Make sure to select the correct genome. The length of the sequences in the output file will be set to the median of the features in the input file.
"
What Is The Appropriate Order For A Background Model In Motif Searches?
Implemention of fasta-get-markov on GUI
Motif Analysis
Using Weeder, Pscan, and PscanChIP for the Discovery of Enriched Transcription Factor Binding Site Motifs in Nucleotide Sequences
A review of ensemble methods for de novo motif discovery in ChIP-Seq data
THiCweed: fast sensitive motif finding via clustering of big data sets
A review of ensemble methods for de novo motif discovery in ChIP-Seq data
THiCweed: fast sensitive motif finding via clustering of big data sets
Monday, 25 July 2016
Friday, 22 July 2016
ChIA-PET Experiments and Analysis Information
ChIA-PET Tools
MICC: an R package for identifying chromatin interactions from ChIA-PET data
A statistical model of ChIA-PET data for accurate detection of chromatin 3D interactions
Review of ChIA-PET Experiments and Analyses
PolII ChIA-PET by Yijun Ruan
ChIA-PET by Richard Young
ChIA-PET Explained in Wiki
MICC: an R package for identifying chromatin interactions from ChIA-PET data
A statistical model of ChIA-PET data for accurate detection of chromatin 3D interactions
Review of ChIA-PET Experiments and Analyses
PolII ChIA-PET by Yijun Ruan
ChIA-PET by Richard Young
ChIA-PET Explained in Wiki
Wednesday, 20 July 2016
Tuesday, 19 July 2016
Saturday, 16 July 2016
Friday, 8 July 2016
Wednesday, 6 July 2016
Tuesday, 5 July 2016
R Data Table .I and J()
Cited from Understanding .I in data.table in r
.I is a vector representing the row numbers
Using .I to return row numbers with data.table package
######################################
Cited from How is J() function implemented in data.table?
J(.) is deprecated and simply replaced with list(.).
.I is a vector representing the row numbers
Using .I to return row numbers with data.table package
######################################
Cited from How is J() function implemented in data.table?
J(.) is deprecated and simply replaced with list(.).
Friday, 1 July 2016
R data.table Tips
Cited from Introduction to data.table
Within the frame of a data.table, columns can be referred to as if they are variables.
We can use “-” on a character columns within the frame of a data.table to sort in decreasing order.
We wrap the variables (column names) within list(), which ensures that a data.table is returned. In case of a single column name, not wrapping with list() returns a vector instead.
data.table also allows using .() to wrap columns with. It is an alias to list(); they both mean the same. Feel free to use whichever you prefer.
Since .() is just an alias for list(), we can name columns as we would while creating a list.
For example,
ans <- flights[, .(delay_arr = arr_delay, delay_dep = dep_delay)]
Speical symbol .N is a special in-built variable that holds the number of observations in the current group.
Setting with=FALSE disables the ability to refer to columns as if they are variables.
We can also deselect columns using - or !.
A change 'by' to 'keyby' automatically orders the result by the grouping variables in increasing order.
Special symbol .SD. It stands for Subset of Data. It by itself is a data.table that holds the data for the current group defined using by.
.SD would contain all the columns other than the grouping variables by default.
Using the argument .SDcols. It accepts either column names or column indices. For example, .SDcols = c("arr_delay", "dep_delay") ensures that .SD contains only these two columns for each group.
######################################
Cited from Keys and fast binary search based subset
We can set keys on multiple columns and the column can be of different types. Uniqueness is not enforced.
Setting a key does two things:
setkey() and setkeyv() modify the input data.table by reference. They return the result invisibly.
In data.table, the := operator and all the set* (e.g., setkey, setorder, setnames etc..) functions are the only ones which modify the input object by reference.
In addition to ordering, keyby also sets the key column.
######################################
Cited from Reference semantics
:= returns the result invisibly. Sometimes it might be necessary to see the result after the assignment. We can accomplish that by adding an empty [] at the end of the query, like flights[hour == 24L, hour := 0L][].
The copy() function deep copies the input object and therefore any subsequent update by reference operations performed on the copied object will not affect the original object.
######################################
Cited from Efficient reshaping using data.tables
By default, variable column is of type factor. Set variable.factor argument to FALSE if you’d like to return a character vector instead.
Within the frame of a data.table, columns can be referred to as if they are variables.
We can use “-” on a character columns within the frame of a data.table to sort in decreasing order.
We wrap the variables (column names) within list(), which ensures that a data.table is returned. In case of a single column name, not wrapping with list() returns a vector instead.
data.table also allows using .() to wrap columns with. It is an alias to list(); they both mean the same. Feel free to use whichever you prefer.
Since .() is just an alias for list(), we can name columns as we would while creating a list.
For example,
ans <- flights[, .(delay_arr = arr_delay, delay_dep = dep_delay)]
Speical symbol .N is a special in-built variable that holds the number of observations in the current group.
We can also deselect columns using - or !.
.SD would contain all the columns other than the grouping variables by default.
Using the argument .SDcols. It accepts either column names or column indices. For example, .SDcols = c("arr_delay", "dep_delay") ensures that .SD contains only these two columns for each group.
######################################
Cited from Keys and fast binary search based subset
We can set keys on multiple columns and the column can be of different types. Uniqueness is not enforced.
Setting a key does two things:
- reorders the rows of the data.table by the column(s) provided by reference, always in increasing order.
- marks those columns as key columns by setting an attribute called sorted to the data.table.
setkey() and setkeyv() modify the input data.table by reference. They return the result invisibly.
In data.table, the := operator and all the set* (e.g., setkey, setorder, setnames etc..) functions are the only ones which modify the input object by reference.
In addition to ordering, keyby also sets the key column.
######################################
Cited from Reference semantics
:= returns the result invisibly. Sometimes it might be necessary to see the result after the assignment. We can accomplish that by adding an empty [] at the end of the query, like flights[hour == 24L, hour := 0L][].
The copy() function deep copies the input object and therefore any subsequent update by reference operations performed on the copied object will not affect the original object.
######################################
Cited from Efficient reshaping using data.tables
By default, variable column is of type factor. Set variable.factor argument to FALSE if you’d like to return a character vector instead.
Thursday, 30 June 2016
HI-C Terminology
Cited from Paper "A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping"
'We define the ‘‘matrix resolution’’ of a Hi-C map as the locus size used to construct a particular contact matrix and the ‘‘map resolution’’ as the smallest locus size such that 80% of loci have at least 1,000 contacts. The map resolution is meant to reflect the finest scale at which one can reliably discern local features.'
'We began by probing the 3D partitioning of the genome. In our earlier experiments at 1 Mb map resolution (Lieberman-Aiden et al., 2009), we saw large squares of enhanced contact frequency tiling the diagonal of the contact matrices. These squares partitioned the genome into 5–20 Mb intervals, which we call ‘‘megadomains.’’We also found that individual 1 Mb loci could be assigned to one of two long-range contact patterns, which we called compartments A and B, with loci in the same compartment showing more frequent interaction. Megadomains—and the associated squares along the diagonal—arise when all of the 1 Mb loci in an interval exhibit the same genome-wide contact pattern. Compartment A is highly enriched for open chromatin; compartment B is enriched for closed chromatin.'
'Two of the five interaction patterns are correlated with loci in compartment A (Figure S4E). We label the loci exhibiting these patterns as belonging to subcompartments A1 and A2. Both A1 and A2 are gene dense, have highly expressed genes, harbor activating chromatin marks such as H3K36me3, H3K79me2, H3K27ac, and H3K4me1 and are depleted at the nuclear lamina and at nucleolus-associated domains (NADs) (Figures 2D, 2E, and S4I; Table S3). While both A1 and A2 exhibit early replication times, A1 finishes replicating at the beginning of S phase, whereas A2 continues replicating into the middle of S phase. A2 is more strongly associated with the presence of H3K9me3 than A1, has lower GC content, and contains longer genes (2.4-fold).'
'The other three interaction patterns (labeled B1, B2, and B3) are correlated with loci in compartment B (Figure S4E) and show very different properties. Subcompartment B1 correlates positively with H3K27me3 and negatively with H3K36me3, suggestive of facultative heterochromatin (Figures 2D and 2E). Replication of this subcompartment peaks during the middle of S phase. Subcompartments B2 and B3 tend to lack all of the above-noted marks and do not replicate until the end of S phase (see Figure 2D). Subcompartment B2 includes 62% of pericentromeric heterochromatin (3.8-fold enrichment) and is enriched at the nuclear lamina (1.8-fold) and at NADs (4.6-fold). Subcompartment B3 is enriched at the nuclear lamina (1.6-fold), but strongly depleted at NADs (76-fold).'
'Upon closer visual examination, we noticed the presence of a sixth pattern on chromosome 19 (Figure 2F). Our genome-wide clustering algorithm missed this pattern because it spans only 11 Mb, or 0.3% of the genome. When we repeated the algorithm on chromosome 19 alone, the additional pattern was detected. Because this sixth pattern correlates with the Compartment B pattern, we labeled it B4. Subcompartment B4 comprises a handful of regions, each of which contains many KRAB-ZNF superfamily genes. (B4 contains 130 of the 278 KRAB-ZNF genes in the genome, a 65-fold enrichment). As noted in previous studies (Vogel et al., 2006; Hahn et al., 2011), these regions exhibit a highly distinctive chromatin pattern, with strong enrichment for both activating chromatin marks, such as H3K36me3, and heterochromatin-associated marks, such as H3K9me3 and H4K20me3.'
'We define the ‘‘matrix resolution’’ of a Hi-C map as the locus size used to construct a particular contact matrix and the ‘‘map resolution’’ as the smallest locus size such that 80% of loci have at least 1,000 contacts. The map resolution is meant to reflect the finest scale at which one can reliably discern local features.'
'We began by probing the 3D partitioning of the genome. In our earlier experiments at 1 Mb map resolution (Lieberman-Aiden et al., 2009), we saw large squares of enhanced contact frequency tiling the diagonal of the contact matrices. These squares partitioned the genome into 5–20 Mb intervals, which we call ‘‘megadomains.’’We also found that individual 1 Mb loci could be assigned to one of two long-range contact patterns, which we called compartments A and B, with loci in the same compartment showing more frequent interaction. Megadomains—and the associated squares along the diagonal—arise when all of the 1 Mb loci in an interval exhibit the same genome-wide contact pattern. Compartment A is highly enriched for open chromatin; compartment B is enriched for closed chromatin.'
'Two of the five interaction patterns are correlated with loci in compartment A (Figure S4E). We label the loci exhibiting these patterns as belonging to subcompartments A1 and A2. Both A1 and A2 are gene dense, have highly expressed genes, harbor activating chromatin marks such as H3K36me3, H3K79me2, H3K27ac, and H3K4me1 and are depleted at the nuclear lamina and at nucleolus-associated domains (NADs) (Figures 2D, 2E, and S4I; Table S3). While both A1 and A2 exhibit early replication times, A1 finishes replicating at the beginning of S phase, whereas A2 continues replicating into the middle of S phase. A2 is more strongly associated with the presence of H3K9me3 than A1, has lower GC content, and contains longer genes (2.4-fold).'
'The other three interaction patterns (labeled B1, B2, and B3) are correlated with loci in compartment B (Figure S4E) and show very different properties. Subcompartment B1 correlates positively with H3K27me3 and negatively with H3K36me3, suggestive of facultative heterochromatin (Figures 2D and 2E). Replication of this subcompartment peaks during the middle of S phase. Subcompartments B2 and B3 tend to lack all of the above-noted marks and do not replicate until the end of S phase (see Figure 2D). Subcompartment B2 includes 62% of pericentromeric heterochromatin (3.8-fold enrichment) and is enriched at the nuclear lamina (1.8-fold) and at NADs (4.6-fold). Subcompartment B3 is enriched at the nuclear lamina (1.6-fold), but strongly depleted at NADs (76-fold).'
'Upon closer visual examination, we noticed the presence of a sixth pattern on chromosome 19 (Figure 2F). Our genome-wide clustering algorithm missed this pattern because it spans only 11 Mb, or 0.3% of the genome. When we repeated the algorithm on chromosome 19 alone, the additional pattern was detected. Because this sixth pattern correlates with the Compartment B pattern, we labeled it B4. Subcompartment B4 comprises a handful of regions, each of which contains many KRAB-ZNF superfamily genes. (B4 contains 130 of the 278 KRAB-ZNF genes in the genome, a 65-fold enrichment). As noted in previous studies (Vogel et al., 2006; Hahn et al., 2011), these regions exhibit a highly distinctive chromatin pattern, with strong enrichment for both activating chromatin marks, such as H3K36me3, and heterochromatin-associated marks, such as H3K9me3 and H4K20me3.'
Definition of In situ Hi-C
Cited from Paper "A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping"
In situ Hi-C: DNA-DNA proximity ligation is performed in intact nuclei.
In situ Hi-C: DNA-DNA proximity ligation is performed in intact nuclei.
Sunday, 26 June 2016
Monday, 20 June 2016
Multiscale analysis of genome-wide replication timing profiles using a wavelet-based signal-processing algorithm
"Replication starts from a set of initiation loci, called replication
origins, where two replication forks are assembled and begin replicating
DNA while proceeding in opposite directions, away from the loci; fork
progression continues until two converging forks 'collide' at a terminus
of replication."
"The DNA replication program in a cell is defined as the temporal sequence of locus replication events during the S phase. The program depends on the locations of the replication origins, their activation times and the speed at which replication forks move along the DNA double helix."
"The DNA replication program in a cell is defined as the temporal sequence of locus replication events during the S phase. The program depends on the locations of the replication origins, their activation times and the speed at which replication forks move along the DNA double helix."
DNA Replication Timing
- Replication of eukaryotic chromosomes takes place in segments.
- The rate of elongation of replication forks varies little throughout S phase.
- It is the temporal order of replication, not the sites of initiation, that is conserved among species;
- In multicellular
but not unicellular organisms, early replication
is correlated with transcriptional activity and is
developmentally regulated. - The importance of large-scale chromatin folding in the regulation of replication timing in both yeasts and mammals.
Sunday, 12 June 2016
Saturday, 11 June 2016
Friday, 3 June 2016
Differences Between Epiblast and Embryonic Stem Cells
Cited from Collection: Naive Pluripotency
"Ground-state naive pluripotency is established in the epiblast of the mature blastocyst and may be captured in vitro in the form of embryonic stem cells. Although rodent cells can exist in both primed and naive pluripotent states, establishing a naive state in human cells has been difficult to obtain."
"Ground-state naive pluripotency is established in the epiblast of the mature blastocyst and may be captured in vitro in the form of embryonic stem cells. Although rodent cells can exist in both primed and naive pluripotent states, establishing a naive state in human cells has been difficult to obtain."
Cell Identity Markers
Gata4, a primitive endoderm marker (Paper: Control of ground-state pluripotency by allelic regulation of Nanog)
Fgf4, pluripotency-associated genes (Paper: Control of ground-state pluripotency by allelic regulation of Nanog)
Pecam1, a non-pluripotency transmembrane protein on cell surface expressed in mouse embroynic stem cells.
Bmp4, a non-pluripotency factor expressed in mouse embroynic stem cells, a member of the bone morphogenetic protein family which is part of the transforming growth factor-beta superfamily.
Fgf4, pluripotency-associated genes (Paper: Control of ground-state pluripotency by allelic regulation of Nanog)
Pecam1, a non-pluripotency transmembrane protein on cell surface expressed in mouse embroynic stem cells.
Bmp4, a non-pluripotency factor expressed in mouse embroynic stem cells, a member of the bone morphogenetic protein family which is part of the transforming growth factor-beta superfamily.
Naive Epiblast Explanation
Cited from the paper "Nanog Is the Gateway to the Pluripotent Ground State"
" After fertilization, mammalian zygotes follow a program of cleavage divisions and elaborate two extraembryonic lineages, trophoblast and hypoblast (Selwood and Johnson, 2006). This preparatory phase of development culminates in creation of the embryo founder tissue, a population of unrestricted pluripotent cells known as the epiblast (Gardner and Beddington, 1988 and Nichols and Smith, 2009). The epiblast proliferates to provide the substrate for axis formation, germlayer specification, and gastrulation. Naive early epiblast cells can be immortalized in culture in the form of embryonic stem (ES) cells (Brook and Gardner, 1997, Evans and Kaufman, 1981 and Martin, 1981). Pluripotent cells can also be created outside the embryo by reprogramming somatic cells, either by fusion with pre-existing pluripotent cells (Miller and Ruddle, 1976, Tada et al., 1997, Tada et al., 2001 and Takagi et al., 1983) or, more compellingly, by transfection with regulatory transcription factors (Takahashi and Yamanaka, 2006)."
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cited from the paper "Control of ground-state pluripotency by allelic regulation of Nanog"
"The ICM (inner cell mass) of the late blastocyst contains two lineages: the extra-embryonic primitive endoderm, and the ‘ground-state’ pluripotent epiblast6, 8, which gives rise to the embryo. Inner cells expressing Nanog biallelically also express Oct4 but not Gata4, a primitive endoderm marker9, and therefore are epiblast cells."
" After fertilization, mammalian zygotes follow a program of cleavage divisions and elaborate two extraembryonic lineages, trophoblast and hypoblast (Selwood and Johnson, 2006). This preparatory phase of development culminates in creation of the embryo founder tissue, a population of unrestricted pluripotent cells known as the epiblast (Gardner and Beddington, 1988 and Nichols and Smith, 2009). The epiblast proliferates to provide the substrate for axis formation, germlayer specification, and gastrulation. Naive early epiblast cells can be immortalized in culture in the form of embryonic stem (ES) cells (Brook and Gardner, 1997, Evans and Kaufman, 1981 and Martin, 1981). Pluripotent cells can also be created outside the embryo by reprogramming somatic cells, either by fusion with pre-existing pluripotent cells (Miller and Ruddle, 1976, Tada et al., 1997, Tada et al., 2001 and Takagi et al., 1983) or, more compellingly, by transfection with regulatory transcription factors (Takahashi and Yamanaka, 2006)."
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cited from the paper "Control of ground-state pluripotency by allelic regulation of Nanog"
"The ICM (inner cell mass) of the late blastocyst contains two lineages: the extra-embryonic primitive endoderm, and the ‘ground-state’ pluripotent epiblast6, 8, which gives rise to the embryo. Inner cells expressing Nanog biallelically also express Oct4 but not Gata4, a primitive endoderm marker9, and therefore are epiblast cells."
Wednesday, 1 June 2016
GenomicRanges: 'queryHits' and 'subjectHits' in findOverlaps
Cited from IRanges - minoverlap
> hits = findOverlaps(ir, minoverlap=100L)
It returns an object that tells which queries overlap which subjects, where query and subject are in effect the same ranges.
> hits = findOverlaps(ir, minoverlap=100L)
It returns an object that tells which queries overlap which subjects, where query and subject are in effect the same ranges.
Tuesday, 31 May 2016
Saturday, 28 May 2016
Gene Targeting
Cited from Gene targeting
"Gene targeting requires the creation of a specific vector for each gene of interest. However, it can be used for any gene, regardless of transcriptional activity or gene size."
"To target genes in mice, this construct is then inserted into mouse embryonic stem cells in culture. After cells with the correct insertion have been selected, they can be used to contribute to a mouse's tissue via embryo injection. Finally, chimeric mice where the modified cells made up the reproductive organs are selected for via breeding. After this step the entire body of the mouse is based on the previously selected embryonic stem cell."
"Gene targeting requires the creation of a specific vector for each gene of interest. However, it can be used for any gene, regardless of transcriptional activity or gene size."
"To target genes in mice, this construct is then inserted into mouse embryonic stem cells in culture. After cells with the correct insertion have been selected, they can be used to contribute to a mouse's tissue via embryo injection. Finally, chimeric mice where the modified cells made up the reproductive organs are selected for via breeding. After this step the entire body of the mouse is based on the previously selected embryonic stem cell."
RNAi siRNA and shRNA
Cited from RNAi (RNA interference) defined
Cited from the paper "A Novel Multiplex Cell Viability Assay for High-Throughput RNAi Screening"
"Nucleus or DNA stain using fluorescent molecules, such as Hoechst 33342, Hoechst 33258, DAPI or other dyes have been long-serving and commonly applied indicators of cellular viability."
- Synthetic (siRNA) or single stranded RNA (ssRNA) containing two complementary sequences separated by a non-complementary sequence, which folds back on itself to form a synthetic short hairpin RNA (shRNA).
- Expressed from a DNA construct which encodes an shRNA molecule. This is the dd (DNA-directed) RNAi approach.
Cited from the paper "A Novel Multiplex Cell Viability Assay for High-Throughput RNAi Screening"
"Nucleus or DNA stain using fluorescent molecules, such as Hoechst 33342, Hoechst 33258, DAPI or other dyes have been long-serving and commonly applied indicators of cellular viability."
Friday, 27 May 2016
Friday, 20 May 2016
Pausing an R Script for User Input
Cited from Pausing an R script: a generic pause function
pause = function(){
if (interactive()) {
invisible(readline(prompt = "Press <Enter> to continue..."))
}
else {
cat("Press <Enter> to continue...")
invisible(readLines(file("stdin"), 1))
}
}
pause = function(){
if (interactive()) {
invisible(readline(prompt = "Press <Enter> to continue..."))
}
else {
cat("Press <Enter> to continue...")
invisible(readLines(file("stdin"), 1))
}
}
Thursday, 19 May 2016
Define Bivalent Promoters Computationally
Cited from Lossof the PolycombMark from Bivalent Promoters Leads to Activation of Cancer-Promoting Genes in Colorectal Tumors
A promoter was defined as bivalent if it contained overlapping H3K4me3 and H3K27me3 peaks at expanded promoter areas (2.4 kb < TSS < 0.6 kb).
A promoter was defined as bivalent if it contained overlapping H3K4me3 and H3K27me3 peaks at expanded promoter areas (2.4 kb < TSS < 0.6 kb).
Wednesday, 18 May 2016
Sunday, 15 May 2016
Thursday, 12 May 2016
High-Throughput (HT) SELEX combines SELEX (Systematic Evolution of Ligands by EXponential Enrichment)
Cited from SELEX experiments: new prospects, applications and data analysis in inferring regulatory pathways
"Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an experimental procedure that allows extraction, from an initially random pool of oligonucleotides, of the oligomers with a desired binding affinity for a given molecular target."
####################################
Cited from Large scale analysis of the mutational landscape in HT-SELEX improves aptamer discovery
Aptamers: short (20–100 nucleotides), synthetic, single-stranded (ribo)-nucleic molecules.
####################################
Cited from Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities
We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing.
"Systematic Evolution of Ligands by EXponential enrichment (SELEX) is an experimental procedure that allows extraction, from an initially random pool of oligonucleotides, of the oligomers with a desired binding affinity for a given molecular target."
####################################
Cited from Large scale analysis of the mutational landscape in HT-SELEX improves aptamer discovery
Aptamers: short (20–100 nucleotides), synthetic, single-stranded (ribo)-nucleic molecules.
####################################
Cited from Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities
We describe here a high-throughput method for analyzing transcription factor binding specificity that is based on systematic evolution of ligands by exponential enrichment (SELEX) and massively parallel sequencing.
Monday, 9 May 2016
TF Target Genes Overlap with Active/Repressive Histone Marks
Cited from Supplementary Figure 4 of "Concerted genomic targeting of H3K27 demethylase REF6 and chromatin-remodeling ATPase BRM in Arabidopsis".
The Axoneme of Cilia
Cited from The Axoneme of Cilia
A cilium, like a flagellum, is composed of a central core (the axoneme), which contains two central microtubules that are surrounded by an outer ring of nine pairs of microtubules.
A cilium, like a flagellum, is composed of a central core (the axoneme), which contains two central microtubules that are surrounded by an outer ring of nine pairs of microtubules.
Saturday, 7 May 2016
Monday, 2 May 2016
Sunday, 1 May 2016
FUCCI and Nocodazole in Studying Cell Cycle
FUCCI Cell Cycle Sensor
Cdt1 is a DNA replication factor. It licenses for the formation of pre-replication complex.
Geminin is a DNA replication inhibitor.
############################################################################
Cited from Aphidicolin
Aphidicolin is an antiviral and antimitotic antibiotic extracted from a fungus, and it is produced as the secondary metabolite. It reversibly inhibits DNA polymerase A,D in eukaryotic cells and therefore inhibits eukaryotic nuclear DNA replication. It arrests cells in early S phase.
############################################################################
Cited from G1 versus G2 cell cycle arrest after adriamycin-induced damage in mouse Swiss3T3 cells
Adriamycin is a DNA damaging agent, inducing DNA intercalating. Adriamycin is known to arrest cells in G1 or G2 phase.
############################################################################
Cited from Nocodazole
Nocodazole interferes with polymerization of microtubules, and cells treated with nocodazole arrests in G2- or M- phase. Prolonged nocodazole leads to apoptosis.
Cdt1 is a DNA replication factor. It licenses for the formation of pre-replication complex.
Geminin is a DNA replication inhibitor.
############################################################################
Cited from Aphidicolin
Aphidicolin is an antiviral and antimitotic antibiotic extracted from a fungus, and it is produced as the secondary metabolite. It reversibly inhibits DNA polymerase A,D in eukaryotic cells and therefore inhibits eukaryotic nuclear DNA replication. It arrests cells in early S phase.
############################################################################
Cited from G1 versus G2 cell cycle arrest after adriamycin-induced damage in mouse Swiss3T3 cells
Adriamycin is a DNA damaging agent, inducing DNA intercalating. Adriamycin is known to arrest cells in G1 or G2 phase.
############################################################################
Cited from Nocodazole
Nocodazole interferes with polymerization of microtubules, and cells treated with nocodazole arrests in G2- or M- phase. Prolonged nocodazole leads to apoptosis.
Tuesday, 26 April 2016
Ubuntu 16.04 Set Up Menu Bar in Terminal and Open New Terminal as Tab
Cited from Missing menus on fresh boots or restarts on ubuntu sessions
mkdir -p .config/autostart
gedit .config/autostart/menus.desktop
and copy the following into this file.
"""
[Desktop Entry]
Type=Application
Exec=initctl restart unity-panel-service
Hidden=false
NoDisplay=false
X-GNOME-Autostart-enabled=true
Name=menus
Comment=Show me the menus
X-GNOME-Autostart-Delay=
"""
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Open Gnome Terminal in New Tab in Ubuntu 15.04
mkdir -p .config/autostart
gedit .config/autostart/menus.desktop
and copy the following into this file.
"""
[Desktop Entry]
Type=Application
Exec=initctl restart unity-panel-service
Hidden=false
NoDisplay=false
X-GNOME-Autostart-enabled=true
Name=menus
Comment=Show me the menus
X-GNOME-Autostart-Delay=
"""
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Open Gnome Terminal in New Tab in Ubuntu 15.04
Monday, 25 April 2016
The Differences Between "<-" and "="
Cited from Difference between assignment operators in R
To reduce ambiguity, we should use either <- or = as assignment operator, and only use = as named-parameter specifier for functions.
In conclusion, for better readability of R code, I suggest that we only use <- for assignment and = for specifying named parameters.
To reduce ambiguity, we should use either <- or = as assignment operator, and only use = as named-parameter specifier for functions.
In conclusion, for better readability of R code, I suggest that we only use <- for assignment and = for specifying named parameters.
Thursday, 21 April 2016
Wednesday, 20 April 2016
R Finding Out Package Version
Cited from How to find out which package version is loaded in R?
sessionInfo()
packageVersion("GetoptLong")
sessionInfo()
packageVersion("GetoptLong")
Monday, 18 April 2016
Split a Bed File Based on Chromosome
Cited from Splitting a `.bed` file based on chromosomes into 'chromosomeName.bed'
awk 'BEGIN{FS="\t"; OFS="\t"} {f = $1 ".bed"; print > f}' input.bed
awk 'BEGIN{FS="\t"; OFS="\t"} {f = $1 ".bed"; print > f}' input.bed
Friday, 15 April 2016
Wednesday, 13 April 2016
Bash: set -x
Cited from find: What's up with basename and dirname?
set -x shows how the expansion works and what the final command is.
set -x shows how the expansion works and what the final command is.
Monday, 11 April 2016
Non-standard Evaluation in R (Meta-programming)
Cited from Non-standard evaluation
substitute() looks at a function argument and instead of seeing the value, it sees the code used to compute the value. substitute() returns an expression.
substitute() works because function arguments are represented by a special type of object called a promise. A promise captures the expression needed to compute the value and the environment in which to compute it.
substitute() is often paired with deparse(). That function takes the result of substitute(), an expression, and turns it into a character vector.
One important feature of deparse() to be aware of when programming is that it can return multiple strings if the input is too long.
eval() takes an expression and evaluates it in the specified environment.
quote(). It captures an unevaluated expression like substitute(), but doesn’t do any of the advanced transformations that can make substitute() confusing. quote() always returns its input as is.
So if you only provide one argument, it will evaluate the expression in the current environment. This makes eval(quote(x)) exactly equivalent to x, regardless of what x is.
eval()’s second argument need not be limited to an environment: it can also be a list or a data frame.
=====================================
Cited from Tips on non-standard evaluation in R
In fact, eval(expr, envir, enclos) basically follows the following logic to evaluate a quoted expression:
Non standard evaluation from another function in R
substitute() looks at a function argument and instead of seeing the value, it sees the code used to compute the value. substitute() returns an expression.
substitute() works because function arguments are represented by a special type of object called a promise. A promise captures the expression needed to compute the value and the environment in which to compute it.
substitute() is often paired with deparse(). That function takes the result of substitute(), an expression, and turns it into a character vector.
One important feature of deparse() to be aware of when programming is that it can return multiple strings if the input is too long.
eval() takes an expression and evaluates it in the specified environment.
quote(). It captures an unevaluated expression like substitute(), but doesn’t do any of the advanced transformations that can make substitute() confusing. quote() always returns its input as is.
So if you only provide one argument, it will evaluate the expression in the current environment. This makes eval(quote(x)) exactly equivalent to x, regardless of what x is.
eval()’s second argument need not be limited to an environment: it can also be a list or a data frame.
=====================================
Cited from Tips on non-standard evaluation in R
In fact, eval(expr, envir, enclos) basically follows the following logic to evaluate a quoted expression:
- If envir is an environment, then evaluate expr in envir by looking for symbols all the way along envir and its parent environments until found.
- If envir is a list, then evaluate expr given the symbols defined in the list; Whenever a symbol is not found in the list, the function will go to enclos environment to find along the chain until found.
- If a symbol is not found until the empty environment (the only environment having no parent) is reached, an error occurs.
Non standard evaluation from another function in R
Friday, 8 April 2016
Thursday, 7 April 2016
Tab "\t" in Bash
Cited from Bash Join Command
$'\t' for the tab character, not just -t \t. Bash does not interpret \t unless in $' ' quotes.
join -t $'\t' ...
$'\t' for the tab character, not just -t \t. Bash does not interpret \t unless in $' ' quotes.
join -t $'\t' ...
Wednesday, 6 April 2016
Java Installation and Update
sudo apt-get install openjdk-8-jre
Tuesday, 5 April 2016
Saturday, 2 April 2016
What does "canonical" mean in biology?
Most likely, "canonical" in biology means "consensus".
Cited from Canonical sequence
"A canonical sequence is a sequence of DNA, RNA, or amino acids that reflects the most common choice of base or amino acid at each position."
Cited from Canonical sequence
"A canonical sequence is a sequence of DNA, RNA, or amino acids that reflects the most common choice of base or amino acid at each position."
Monday, 28 March 2016
Gawk: Count The Number of Upper or Lower Cases in a String
To count the number of upper case letters in a string,
echo 'ERica' | gawk '{print gsub("[A-Z]", "",$0)}'
echo 'ERica' | gawk '{print gsub("[A-Z]", "",$0)}'
Replacement Text Case Conversion in Regular Expression
Replacement Text Case Conversion
For example,
to change the '\2' to the uppercase,
nd=`dirname $f | perl -pe "s|(.+/)([^/]+)/?$|\1\U\2|g"`
For example,
to change the '\2' to the uppercase,
nd=`dirname $f | perl -pe "s|(.+/)([^/]+)/?$|\1\U\2|g"`
Wget (The Non-interactive Network Downloader) Options
--content-disposition
If this is set to on, experimental (not fully-functional) support for "Content-Disposition" headers is enabled. This can currently result in extra round-trips to the server for a "HEAD" request, and is known to suffer from a few bugs, which is why it is not currently enabled by default.
This option is useful for some file-downloading CGI programs that use "Content-Disposition" headers to describe what the name of a downloaded file should be.
--no-check-certificate
Don't check the server certificate against the available certificate authorities. Also don't require the URL host name to
match the common name presented by the certificate.
As of Wget 1.10, the default is to verify the server's certificate against the recognized certificate authorities, breaking the SSL handshake and aborting the download if the verification fails. Although this provides more secure downloads, it does break interoperability with some sites that worked with previous Wget versions, particularly those using self-signed, expired, or otherwise invalid certificates. This option forces an "insecure" mode of operation that turns the certificate verification errors into warnings and allows you to proceed.
If you encounter "certificate verification" errors or ones saying that "common name doesn't match requested host name", you
can use this option to bypass the verification and proceed with the download. Only use this option if you are otherwise convinced of the site's authenticity, or if you really don't care about the validity of its certificate. It is almost always a bad idea not to check the certificates when transmitting confidential or important data.
If this is set to on, experimental (not fully-functional) support for "Content-Disposition" headers is enabled. This can currently result in extra round-trips to the server for a "HEAD" request, and is known to suffer from a few bugs, which is why it is not currently enabled by default.
This option is useful for some file-downloading CGI programs that use "Content-Disposition" headers to describe what the name of a downloaded file should be.
--no-check-certificate
Don't check the server certificate against the available certificate authorities. Also don't require the URL host name to
match the common name presented by the certificate.
As of Wget 1.10, the default is to verify the server's certificate against the recognized certificate authorities, breaking the SSL handshake and aborting the download if the verification fails. Although this provides more secure downloads, it does break interoperability with some sites that worked with previous Wget versions, particularly those using self-signed, expired, or otherwise invalid certificates. This option forces an "insecure" mode of operation that turns the certificate verification errors into warnings and allows you to proceed.
If you encounter "certificate verification" errors or ones saying that "common name doesn't match requested host name", you
can use this option to bypass the verification and proceed with the download. Only use this option if you are otherwise convinced of the site's authenticity, or if you really don't care about the validity of its certificate. It is almost always a bad idea not to check the certificates when transmitting confidential or important data.
Friday, 25 March 2016
R ggplot2 vjust and hjust
What do hjust and vjust do when making a plot using ggplot?
Imagine that the text is bordered within a box.
hjust=0 places the reference position coinciding with the left side of the box. hjust=n (n>0) shifts the box to the left by n*(box width) in relation to the reference position. hjust=n (n<0) shifts the box to the right by n*(box width) from the reference position.
vjust=0 place the reference position coinciding with the bottom side of the box. vjust=n (n>0) shifts the box down in relation to the reference position by n*(box height). vjust=n (n<0) shifts the box up from the reference position by n*(box height).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cited from the book "ggplot2 Elegant Graphics for Data Analysis"
Justification of a string (or legend) defines the location within the string that is placed at the given position. There are two values for horizontal and vertical justification. The values can be:
Imagine that the text is bordered within a box.
hjust=0 places the reference position coinciding with the left side of the box. hjust=n (n>0) shifts the box to the left by n*(box width) in relation to the reference position. hjust=n (n<0) shifts the box to the right by n*(box width) from the reference position.
vjust=0 place the reference position coinciding with the bottom side of the box. vjust=n (n>0) shifts the box down in relation to the reference position by n*(box height). vjust=n (n<0) shifts the box up from the reference position by n*(box height).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cited from the book "ggplot2 Elegant Graphics for Data Analysis"
Justification of a string (or legend) defines the location within the string that is placed at the given position. There are two values for horizontal and vertical justification. The values can be:
- A string: "left", "right", "centre", "center", "bottom", and "top".
- A number between 0 and 1, giving the position within the string (from bottom-left corner).
Thursday, 24 March 2016
Deconvolute R Package UpSetR
Functions located in Helper.funcs.R:
## Finds the columns that represent the sets
FindStartEnd
## Finds the n largest sets if the user hasn't specified any sets
FindMostFreq
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions located in MainBar.R:
Counter
Make_main_bar
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions located in Matrix.R
Create_matrix
Create_layout
MakeShading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions located in SizeBar.R
FindSetFreqs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions located in General.query.funcs.R
General.query.funcs.R
## Finds the columns that represent the sets
FindStartEnd
## Finds the n largest sets if the user hasn't specified any sets
FindMostFreq
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions located in MainBar.R:
Counter
Make_main_bar
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions located in Matrix.R
Create_matrix
Create_layout
MakeShading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions located in SizeBar.R
FindSetFreqs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions located in General.query.funcs.R
General.query.funcs.R
Wednesday, 23 March 2016
Methylation Sequencing Papers
Whole-Genome Bisulfite Sequencing of Two Distinct Interconvertible DNA Methylomes of Mouse Embryonic Stem Cells
Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity
An epigenomic roadmap to induced pluripotency reveals DNA methylation as a reprogramming modulator
Active DNA demethylation at enhancers during the vertebrate phylotypic period
Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity
An epigenomic roadmap to induced pluripotency reveals DNA methylation as a reprogramming modulator
Active DNA demethylation at enhancers during the vertebrate phylotypic period
Monday, 21 March 2016
Perl: the Input Record Separator
Cited from slurp mode - reading a file in one step
The $/ variable is the Input Record Separator in Perl. When we put the read-line operator in scalar context, for example by assigning to a scalar variable $x = <$fh>, perl will read from the file up-to and including the Input Record Separator which is, by default, the new-line \n.
What we did here is we assigned undef to $/. So the read-line operator will read the file up-till the first time it encounters undef in the file. That never happens so it reads till the end of the file. This is what is called slurp mode, because of the sound the file makes when we read it.
The $/ variable is the Input Record Separator in Perl. When we put the read-line operator in scalar context, for example by assigning to a scalar variable $x = <$fh>, perl will read from the file up-to and including the Input Record Separator which is, by default, the new-line \n.
What we did here is we assigned undef to $/. So the read-line operator will read the file up-till the first time it encounters undef in the file. That never happens so it reads till the end of the file. This is what is called slurp mode, because of the sound the file makes when we read it.
Perl: The Difference Between My and Local Variables
Cited from The difference between my and local
'local' temporarily changes the value of the variable, but only within the scope it exists in.
'my' creates a variable that does not appear in the symbol table, and does not exist outside of the scope that it appears in.
$::a refers to $a in the 'global' namespace.
use local when:
'local' temporarily changes the value of the variable, but only within the scope it exists in.
'my' creates a variable that does not appear in the symbol table, and does not exist outside of the scope that it appears in.
$::a refers to $a in the 'global' namespace.
use local when:
- you want to amend a special Perl variable, eg $/ when reading in a file. my $/; throws a compile-time error
Perl Repetition Operator "x"
Cited from How can I repeat a string N times in Perl?
Binary "x" is the repetition operator. In scalar context or if the left operand is not enclosed in parentheses, it returns a string consisting of the left operand repeated the number of times specified by the right operand. In list context, if the left operand is enclosed in parentheses or is a list formed by "qw/STRING/", it repeats the list. If the right operand is zero or negative, it returns an empty string or an empty list, depending on the context.
say ’-’ x 80; # print row of dashes
my @ones = (1) x 80; # a list of 80 1’s
@ones = (5) x @ones; # set all elements to 5
Binary "x" is the repetition operator. In scalar context or if the left operand is not enclosed in parentheses, it returns a string consisting of the left operand repeated the number of times specified by the right operand. In list context, if the left operand is enclosed in parentheses or is a list formed by "qw/STRING/", it repeats the list. If the right operand is zero or negative, it returns an empty string or an empty list, depending on the context.
say ’-’ x 80; # print row of dashes
my @ones = (1) x 80; # a list of 80 1’s
@ones = (5) x @ones; # set all elements to 5
Perl qw() Function
Cited from Using the Perl qw() function
Any non-alphanumeric, non-whitespace delimiter can be used to surround the qw() string argument.
The following are equivalent:
@names = qw(Kernighan Ritchie Pike);
@names = qw/Kernighan Ritchie Pike/;
@names = qw'Kernighan Ritchie Pike';
@names = qw{Kernighan Ritchie Pike};
No interpolation is possible in the string you pass to qw().
Any non-alphanumeric, non-whitespace delimiter can be used to surround the qw() string argument.
The following are equivalent:
@names = qw(Kernighan Ritchie Pike);
@names = qw/Kernighan Ritchie Pike/;
@names = qw'Kernighan Ritchie Pike';
@names = qw{Kernighan Ritchie Pike};
No interpolation is possible in the string you pass to qw().
Wednesday, 16 March 2016
Signal Artifact Blacklist Regions
A comprehensive collection of signal artif act blacklist regions
Toos to remove reads in the blacklist from bam files:
bamutils filter
Toos to remove reads in the blacklist from bam files:
bamutils filter
Monday, 14 March 2016
Bash Run a Given Function in Parallel
How to run given function in Bash in parallel?
GNU Parallel and Bash functions: How to run the simple example from the manual
Bash script processing commands in parallel
parallel -j $NSLOTS -q --pipe <commands>
GNU Parallel and Bash functions: How to run the simple example from the manual
Bash script processing commands in parallel
parallel -j $NSLOTS -q --pipe <commands>
Thursday, 10 March 2016
Git Commands
Cited from Ry’s Git Tutorial
git --verison
to turn a directory into a Git repository
cd [dirname]; git init
A
An untracked file is one that is not under version control.
You should only track source files and omit anything that can be generated from those files.
A snapshot represents the state of your project at a given point in time.
Git’s term for creating a snapshot is called staging.
The git status command will only show us uncommitted changes. To view our project history, git log.
To tell Git who we are,
The
Another useful configuration is to pass a filename to git log filename to display file-specific history.
git checkout <commit-id>
View a previous commit.
Tags are convenient references to official releases and other significant milestones in a software project. It lets developers easily browse and check out important revisions. For example, we can now use the v1.0 tag to refer to the third commit instead of its random ID. To view a list of existing tags, execute git tag without any arguments.
git tag -a v1.0 -m "message"
Never make changes directly to a previous revision.
When using git revert, remember to specify the commit that you want to undo—not the stable commit that you want to return to. It helps to think of this command as saying “undo this commit” rather than “restore this version.”
In Git, a branch is an independent line of development.
The HEAD is Git’s internal way of indicating the snapshot that is currently checked out.
To create a new branch,
git branch branch-name
To checkout a branch,
git checkout branch-name
When the history of two branches diverges, a dedicated commit is required to combine the branches. This situation may also give rise to a merge conflict, which must be manually resolved before anything can be committed to the repository.
Conflicts occur when we try to merge branches that have edited the same content.
###################################################################
git --verison
to turn a directory into a Git repository
cd [dirname]; git init
A
.git directory stores all the tracking data for our repository.An untracked file is one that is not under version control.
You should only track source files and omit anything that can be generated from those files.
git add
command tells Git to add the file to the repository.A snapshot represents the state of your project at a given point in time.
Git’s term for creating a snapshot is called staging.
The git status command will only show us uncommitted changes. To view our project history, git log.
To tell Git who we are,
git
config
--global
user.name
"Your Name"
git
config
--global
user.email
your.email@example.com
The
--global
flag tells Git to use this configuration as a default for
all of your repositories. Omitting it lets you specify different user
information for individual repositories.Another useful configuration is to pass a filename to git log filename to display file-specific history.
git checkout <commit-id>
View a previous commit.
Tags are convenient references to official releases and other significant milestones in a software project. It lets developers easily browse and check out important revisions. For example, we can now use the v1.0 tag to refer to the third commit instead of its random ID. To view a list of existing tags, execute git tag without any arguments.
git tag -a v1.0 -m "message"
Never make changes directly to a previous revision.
When using git revert, remember to specify the commit that you want to undo—not the stable commit that you want to return to. It helps to think of this command as saying “undo this commit” rather than “restore this version.”
In Git, a branch is an independent line of development.
The HEAD is Git’s internal way of indicating the snapshot that is currently checked out.
To create a new branch,
git branch branch-name
To checkout a branch,
git checkout branch-name
When the history of two branches diverges, a dedicated commit is required to combine the branches. This situation may also give rise to a merge conflict, which must be manually resolved before anything can be committed to the repository.
Conflicts occur when we try to merge branches that have edited the same content.
###################################################################
Monday, 7 March 2016
Sunday, 6 March 2016
Saturday, 5 March 2016
Awk/Gawk Verion
Cited from How can I find my awk version
gawk -Wversion 2>/dev/null || gawk --version
gawk -Wversion 2>/dev/null || gawk --version
Pass Arguments to a Bash Script
How to pass arguments to a Bash-script
How to handle multiple input file arguments using getopts
Command Line Options: How To Parse In Bash Using “getopt”
Bash getopt versus getopts
How to use getopt in bash command line with only long options?
Small getopts tutorial
How to handle multiple input file arguments using getopts
Command Line Options: How To Parse In Bash Using “getopt”
Bash getopt versus getopts
How to use getopt in bash command line with only long options?
Small getopts tutorial
Tuesday, 1 March 2016
Monday, 29 February 2016
Saturday, 27 February 2016
Thursday, 25 February 2016
Wednesday, 24 February 2016
Sunday, 21 February 2016
Friday, 19 February 2016
R gtable Package (Top,Bottom,Left and Right Extent)
Cited from Index Position in the gtable
"tlrb" refers to the index position in the gtable (think of it as a matrix): t=2, b=5 means that the grob will be placed from the second to the fifth row (inclusive).
"tlrb" refers to the index position in the gtable (think of it as a matrix): t=2, b=5 means that the grob will be placed from the second to the fifth row (inclusive).
R Grid Coordinates
Cited from grid Graphics
Each viewport has a number of coordinate systems available. There are four main types: absolute coordinates (e.g.,"inches", "cm") allow locations and sizes in terms of physical coordinates -- there is no dependence on the size of the page; normalised coordinates (e.g., "npc") allow locations and sizes as a proportion of the page size (or the current viewport); relative coordinates (i.e.,"native") allow locations and sizes relative to a user-de ned set of x- and y-ranges; referential coordinates (e.g., "strwidth") where locations and sizes are based on the size of some other graphical object.
Each viewport has a number of coordinate systems available. There are four main types: absolute coordinates (e.g.,"inches", "cm") allow locations and sizes in terms of physical coordinates -- there is no dependence on the size of the page; normalised coordinates (e.g., "npc") allow locations and sizes as a proportion of the page size (or the current viewport); relative coordinates (i.e.,"native") allow locations and sizes relative to a user-de ned set of x- and y-ranges; referential coordinates (e.g., "strwidth") where locations and sizes are based on the size of some other graphical object.
R Grid Package Introduction
Cited from the paper "Fun with the R Grid Package"
By default, the coordinates of the lower left corner of a viewport are (0, 0), and the upper right corner has coordinates 1.
upViewport(2)
The argument in brackets determines the number of generations to move up the viewport tree.
The use of col=NA prevents the outlines from being drawn.
The clip="on" makes it possible to “spill” an graphic object outside the viewport region.
Two ways to interact with a grob (graphic object):
Directly,
grid."shape"()
Indirectly,
"shape"Grob()
If modify we want a grob to draw by using one the of these functions grobs, we could use the and grid.draw() function. We can modify a grob by using the functions grid.edit() and editGrob().
The function gList() allows us to create a list of grobs. It facilitates the construction of several items in one plotting region together.
The function gTree() creates a tree-structure which can be used to organise the components of more complicated graphic objects. Such a tree-structure contains several grobs nested together. In a tree-structure, a grob can contain other grobs. The "children" argument specifies the components of the gTree. The children component is usually a list, constructed by gList.
By default, the coordinates of the lower left corner of a viewport are (0, 0), and the upper right corner has coordinates 1.
upViewport(2)
The argument in brackets determines the number of generations to move up the viewport tree.
The use of col=NA prevents the outlines from being drawn.
The clip="on" makes it possible to “spill” an graphic object outside the viewport region.
Two ways to interact with a grob (graphic object):
Directly,
grid."shape"()
Indirectly,
"shape"Grob()
If modify we want a grob to draw by using one the of these functions grobs, we could use the and grid.draw() function. We can modify a grob by using the functions grid.edit() and editGrob().
The function gList() allows us to create a list of grobs. It facilitates the construction of several items in one plotting region together.
The function gTree() creates a tree-structure which can be used to organise the components of more complicated graphic objects. Such a tree-structure contains several grobs nested together. In a tree-structure, a grob can contain other grobs. The "children" argument specifies the components of the gTree. The children component is usually a list, constructed by gList.
Tuesday, 16 February 2016
The Shebang Line
The Shebang for Rscript is
#!/usr/bin/env Rscript
Full details are available from Howto Make Script More Portable With #!/usr/bin/env As a Shebang
#!/usr/bin/env Rscript
Full details are available from Howto Make Script More Portable With #!/usr/bin/env As a Shebang
Sunday, 14 February 2016
Thursday, 11 February 2016
R: Environment and Frame
Cited from "R in a Nutshell"
" An environment is is an R object that contains the set of symbols available in a given context, the objects associated with those symbols, and a pointer to a parent environment. The symbols and associated objects are called a frame."
"The parent environment of a function is the environment in which the function was created."
======================================
Cited from How R Searches and Finds Stuff
" An environment is is an R object that contains the set of symbols available in a given context, the objects associated with those symbols, and a pointer to a parent environment. The symbols and associated objects are called a frame."
"The parent environment of a function is the environment in which the function was created."
======================================
Cited from How R Searches and Finds Stuff
R: Rle or RleList Objects
Cited from http://kasperdanielhansen.github.io/genbioconductor/html/GenomicRanges_Rle.html
The Rle (run length encoding) class in R is intended for representation genome-wide sequence coverage.
The Wig and BigWig files are used to store coverage data.
The run-length-encoded representation of a vector, represents the vector as a set of distinct runs with their own value. This class is integrated in the IRanges package. A base class called "rle" implements much less functionality.
runLength(), runValue() and as.numeric() function takes in the "Rle" class object.
RleList represents a list of Rles. It stores a genome wide coverage track where each element of the list is a different chromosome.
======================================
Cited from IRanges and GenomicRanges An introduction
aggregate() allows you to apply functions to the Rle inside an IRanges
aggregate(Rle_object, IRange_object, FUN=func_name)
The Rle (run length encoding) class in R is intended for representation genome-wide sequence coverage.
The Wig and BigWig files are used to store coverage data.
The run-length-encoded representation of a vector, represents the vector as a set of distinct runs with their own value. This class is integrated in the IRanges package. A base class called "rle" implements much less functionality.
runLength(), runValue() and as.numeric() function takes in the "Rle" class object.
RleList represents a list of Rles. It stores a genome wide coverage track where each element of the list is a different chromosome.
======================================
Cited from IRanges and GenomicRanges An introduction
aggregate() allows you to apply functions to the Rle inside an IRanges
aggregate(Rle_object, IRange_object, FUN=func_name)
Wednesday, 10 February 2016
How to Check If Folder Is Empty or Have Folder File Use Shell Script?
Cited from
if [ "$(ls -A $DIR 2> /dev/null)" == "" ];
then
# The directory is empty
fi
if [ "$(ls -A $DIR 2> /dev/null)" == "" ];
then
# The directory is empty
fi
Friday, 5 February 2016
Thursday, 4 February 2016
Subscribe to:
Posts (Atom)