Friday, 11 December 2015
Saturday, 5 December 2015
Tuesday, 1 December 2015
Pre-implantation development and Blastocyst
Pre-implantation development
Cited from the book "Human Embryology and Developmental Biology"
The subdivision of the inner cell mass ultimately results in an embryonic body that contains the three primary embryonic germ layers: the ectoderm (outer layer), mesoderm (middle layer), and endoderm (inner layer). The process by which the germ layers are formed through cell movements is called gastrulation.
Cited from the book "Human Embryology and Developmental Biology"
The subdivision of the inner cell mass ultimately results in an embryonic body that contains the three primary embryonic germ layers: the ectoderm (outer layer), mesoderm (middle layer), and endoderm (inner layer). The process by which the germ layers are formed through cell movements is called gastrulation.
Monday, 23 November 2015
DMNT3A and DMNT3B
- Addition of a methyl group to cytosine in the context of C-G dinucleotide
- DMNTS are associated with chromatin remodelling.
- 3A and 3B: de novo DNA methyltransferase
- Double KO lose differentiation capacity with passage.
Sunday, 22 November 2015
Thursday, 19 November 2015
R plot function
Cited from How to Change Plot Options in R
"bty" is the plot function parameter that specifies the type of b round the plot area, use the option bty (box type):
Cited from 15 Questions All R Users Have About Plots
Setting "xaxt" and "yaxt" parameter values equal to "n" removes the axis values of a plot. Any other character set for these arguments specifies the x-axis or y-axis values to be plotted.
Setting "ann" to "FALSE" removes the plotting of axes titles from plotting.
=====================================
Cited from Graphics with R
By default, the specified ranges of "xlim" and "ylim" are enlarged by 6%, so that values are not localised to the edges of a plot. In this case, "xaxs" and "yaxs" are set to the default value of "r" ("regular"). In contrary, setting "xaxs" and "yaxs" arguments to the character of "i" ("internal") specifies the limits at the edges of a plot.
"bty" is the plot function parameter that specifies the type of b round the plot area, use the option bty (box type):
- "o": The default value draws a complete rectangle around the plot.
- "n": Draws nothing around the plot.
Cited from 15 Questions All R Users Have About Plots
Setting "xaxt" and "yaxt" parameter values equal to "n" removes the axis values of a plot. Any other character set for these arguments specifies the x-axis or y-axis values to be plotted.
Setting "ann" to "FALSE" removes the plotting of axes titles from plotting.
=====================================
Cited from Graphics with R
By default, the specified ranges of "xlim" and "ylim" are enlarged by 6%, so that values are not localised to the edges of a plot. In this case, "xaxs" and "yaxs" are set to the default value of "r" ("regular"). In contrary, setting "xaxs" and "yaxs" arguments to the character of "i" ("internal") specifies the limits at the edges of a plot.
Wednesday, 18 November 2015
Monday, 16 November 2015
SCDE: Q&A
Cited from Definition of columns of scde diff DE matrix
These should provide estimates of the limma values.
logFC == mle
P.value == pnorm(Z)
adj.P.val == pnorm(cZ)
=====================================
Cited from Normalized read counts
scde.expression.magnitude returns FPM (not normalized by transcript length).
=====================================
p.self.fail=scde.failure.probability(models=o.ifm,counts=cd)
These should provide estimates of the limma values.
logFC == mle
P.value == pnorm(Z)
adj.P.val == pnorm(cZ)
=====================================
Cited from Normalized read counts
scde.expression.magnitude returns FPM (not normalized by transcript length).
=====================================
p.self.fail=scde.failure.probability(models=o.ifm,counts=cd)
Saturday, 14 November 2015
Mangaging Bash Processes
Cited from "Bioinformatics Data Skills Reproducible and Robust Research with Open Source Tools"
What is a shell process?
"When we run programs through the Unix shell, they become processes until they successfully finish or terminate with an error."
Background Processes
To run a program in the background, an ampersand (&) can be appended to the end of the command.
To check what processes have been running in the background, the command of "jobs" can be run.
What is a shell process?
"When we run programs through the Unix shell, they become processes until they successfully finish or terminate with an error."
Background Processes
To run a program in the background, an ampersand (&) can be appended to the end of the command.
To check what processes have been running in the background, the command of "jobs" can be run.
Tee Command
Cited from "Bioinformatics Data Skills Reproducible and Robust Research with Open Source Tools"
"The Unix program tee diverts a copy of your pipeline’s standard output stream to an intermediate file while still passing it through its standard output."
For example,
program1 input.txt | tee intermediate-file.txt | program2 > results.txt
"The Unix program tee diverts a copy of your pipeline’s standard output stream to an intermediate file while still passing it through its standard output."
For example,
program1 input.txt | tee intermediate-file.txt | program2 > results.txt
Markdown Syntax
Cited from Markdown Tutorial
Italic and Bold
To make a phrase italic in Markdown, the words can be surrounded by underscores ("_"), such as "_italic_".
To make phrases bold in Markdown, the words can be surrounded with two asterisks (
Headers
There are six types of headers, in decreasing sizes. The same number of hash marks before a header specified the size of the header in decreasing size.
A header can not be made bold, but certain words can be italicized.
Links to Websites
To create an inline link, the link text is wrapped in brackets "[ ]", and then you wrap the link in parenthesis
The reference link is a reference to another place in the document. For example, "[text][reference]" in the text, and at the bottom of the markdown document "[reference]:url".
Images
To create an inline image, "!(alt text)[url]".
Blockquote
A blockquote is a sentence or paragraph that's been specially formatted to draw attention to the reader.
To create a block quote, preface a paragraph or several paragraphs can be prefaced with the "greater than" caret (
Lists
To create an unordered list, each item in the list must be prefaced with an asterisk and space ("
An ordered list is prefaced with numbers, instead of asterisks.
With a nested list, the sub-item must be indented one space more compared to the preceding item.
Graphs
If a new line was forcefully inserted, the togetherness may be broken. This would be the case of a hard break. Two spaces after each new line may be inserted to create a soft break.
Italic and Bold
To make a phrase italic in Markdown, the words can be surrounded by underscores ("_"), such as "_italic_".
To make phrases bold in Markdown, the words can be surrounded with two asterisks (
"**
"), such as "**bold**".Headers
There are six types of headers, in decreasing sizes. The same number of hash marks before a header specified the size of the header in decreasing size.
A header can not be made bold, but certain words can be italicized.
Links to Websites
To create an inline link, the link text is wrapped in brackets "[ ]", and then you wrap the link in parenthesis
"( )"
. For example, "[Visit GitHub!](www.github.com)".The reference link is a reference to another place in the document. For example, "[text][reference]" in the text, and at the bottom of the markdown document "[reference]:url".
Images
To create an inline image, "!(alt text)[url]".
Blockquote
A blockquote is a sentence or paragraph that's been specially formatted to draw attention to the reader.
To create a block quote, preface a paragraph or several paragraphs can be prefaced with the "greater than" caret (
>
).Lists
To create an unordered list, each item in the list must be prefaced with an asterisk and space ("
* "
). Each item must be listed in its own line.An ordered list is prefaced with numbers, instead of asterisks.
With a nested list, the sub-item must be indented one space more compared to the preceding item.
Graphs
If a new line was forcefully inserted, the togetherness may be broken. This would be the case of a hard break. Two spaces after each new line may be inserted to create a soft break.
Thursday, 12 November 2015
Data Type in Java
The set of values for each data type is known as the domain of that type.
Cited from "HKUSTx: COMP102.1x Introduction to Java Programming - Part 1"
Adding the keyword "final" to the declaration of a variable making the variable constant. For example, "final double bodyWeight;". A value can be assigned to a "final" variable once only.
Adding the keyword "final" to the declaration of a variable making the variable constant. For example, "final double bodyWeight;". A value can be assigned to a "final" variable once only.
Camel Case in Java
Cited from "HKUSTx COMP102.1x Introduction to Java Programming - Part 1"
Lower camelCase for names of variables and methods. For example, "double areaOfCircle".
Upper CamelCase for names of classes. For example, "public class HelloWorld".
JavaScript Library D3.js
Cited from Data-Driven Documents
D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS.
D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS.
Wednesday, 11 November 2015
R bquote Function Explained
Cited from An R bquote example
When a plot is annotated with mathematics symbols in R, the use of expression may be required.
For example,
text(0, height[2], labels=expression(Y[med] ~ "=" ~ B*x^2), cex=3).
Contents surrounded by square ("[" and "]") brackets appear in subscript.
The tilde "~" operates as a separator, and does not show up in a plot.
If we wish to introduce variables in the annotation along with the mathematics symbols, the bquote function may be used.
For example,
text(0, height[i], labels=bquote(Y[.(z2[i])] ~ "=" ~ .(z1[i])*x^2), cex=3)
.(variable_name) retrieves the value stored in the variable and place the value inside the expression.
When a plot is annotated with mathematics symbols in R, the use of expression may be required.
For example,
text(0, height[2], labels=expression(Y[med] ~ "=" ~ B*x^2), cex=3).
Contents surrounded by square ("[" and "]") brackets appear in subscript.
The tilde "~" operates as a separator, and does not show up in a plot.
If we wish to introduce variables in the annotation along with the mathematics symbols, the bquote function may be used.
For example,
text(0, height[i], labels=bquote(Y[.(z2[i])] ~ "=" ~ .(z1[i])*x^2), cex=3)
.(variable_name) retrieves the value stored in the variable and place the value inside the expression.
Tuesday, 10 November 2015
Friday, 6 November 2015
Bayesian Inference Basics
Cited from the book "Statistical Rethinking: A Bayesian Course with Examples in R and Stan"
Maximum a posteriori (MAP) is the mode of the posterior distribution.
Maximum a posteriori (MAP) is the mode of the posterior distribution.
Binomial,Geometric,Hypergeometric,Poisson,NegB Distributions
Cited from the Youtube video Overview of Some Discrete Probability Distributions (Binomial,Geometric,Hypergeometric,Poisson,NegB)
Binomial, negative binomial and geometric distributions depends on the assumption of independent Bernoulli trials.
Binomial distribution:
The number of trials is fixed, and the number of successes is the random variable.
Bernoulli distribution:
A special case of binomial distribution, the number of trail is fixed as 1, and the number of successes is the random variable.
Negative binomial distribution:
The number of successes is fixed, and the number of trial is the random variable.
Geometric distribution:
A special case of negative binomial distribution, the number of successes is fixed as 1, and the number of trials is the random variable.
Hypergeometric distribution depends on the assumption of non-independent trials. The drawing is without replacement from a source that contains a certain a certain number of successes and a certain number of failures.
Hypergeometric distribution:
Similar to binomial distributions, the number of trials is fixed, and the number of successes is the random variable.
If objects were sampled from a large population without replacement, the inter-dependence has a small effect. Then the binomial distribution closely approximates the hypergeometric distribution.
Poisson districtuion :
The Poisson distribution models the number of events (the random variable) in a given time, length, area or volume, etc, if these events occur randomly and independently.
The Poisson distribution approximates the Binomial distribution, when the number of trials (n) is large, and p the probability of successes is very small.
Binomial, negative binomial and geometric distributions depends on the assumption of independent Bernoulli trials.
Binomial distribution:
The number of trials is fixed, and the number of successes is the random variable.
Bernoulli distribution:
A special case of binomial distribution, the number of trail is fixed as 1, and the number of successes is the random variable.
Negative binomial distribution:
The number of successes is fixed, and the number of trial is the random variable.
Geometric distribution:
A special case of negative binomial distribution, the number of successes is fixed as 1, and the number of trials is the random variable.
Hypergeometric distribution depends on the assumption of non-independent trials. The drawing is without replacement from a source that contains a certain a certain number of successes and a certain number of failures.
Hypergeometric distribution:
Similar to binomial distributions, the number of trials is fixed, and the number of successes is the random variable.
If objects were sampled from a large population without replacement, the inter-dependence has a small effect. Then the binomial distribution closely approximates the hypergeometric distribution.
Poisson districtuion :
The Poisson distribution models the number of events (the random variable) in a given time, length, area or volume, etc, if these events occur randomly and independently.
The Poisson distribution approximates the Binomial distribution, when the number of trials (n) is large, and p the probability of successes is very small.
Tuesday, 3 November 2015
Monday, 2 November 2015
Monday, 26 October 2015
Saturday, 17 October 2015
Monday, 12 October 2015
Sunday, 11 October 2015
Friday, 9 October 2015
Thursday, 8 October 2015
Set the labels size on a pie chart in python
Cited from How to set the labels size on a pie chart in python
import matplotlib as mpl
mpl.rcParams['font.size'] = 9.0
import matplotlib as mpl
mpl.rcParams['font.size'] = 9.0
Monday, 5 October 2015
Parallel: Installation and Tutorial
Cited from http://git.savannah.gnu.org/cgit/parallel.git/tree/README
"Full installation of GNU Parallel is as simple as:
wget http://ftpmirror.gnu.org/parallel/parallel-20150922.tar.bz2 bzip2 -dc parallel-20150922.tar.bz2 | tar xvf - cd parallel-20150922 ./configure && make && sudo make install"
===================================
Parallel Tutorial
Tool: Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them
"Full installation of GNU Parallel is as simple as:
wget http://ftpmirror.gnu.org/parallel/parallel-20150922.tar.bz2 bzip2 -dc parallel-20150922.tar.bz2 | tar xvf - cd parallel-20150922 ./configure && make && sudo make install"
===================================
Parallel Tutorial
Tool: Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them
Saturday, 3 October 2015
Tuesday, 29 September 2015
Wednesday, 23 September 2015
Monday, 21 September 2015
Friday, 18 September 2015
Thursday, 17 September 2015
Wednesday, 16 September 2015
Bash: -depth
-depth Process each directory’s contents before the directory itself.
Bash: shopt -s globstar
Cited from The Shopt Builtin
shopt: change shell optional behaviour.
-s set the specified option.
-u disable the specified option.
option:
globstar
"If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a ‘/’, only directories and subdirectories match."
shopt: change shell optional behaviour.
-s set the specified option.
-u disable the specified option.
option:
globstar
"If set, the pattern ‘**’ used in a filename expansion context will match all files and zero or more directories and subdirectories. If the pattern is followed by a ‘/’, only directories and subdirectories match."
Sunday, 13 September 2015
Thursday, 10 September 2015
Wednesday, 9 September 2015
Tuesday, 1 September 2015
Saturday, 29 August 2015
Histone Modification Code
Cited from the paper "Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease"
H3K4me3 (associated primarily with active promoters); H3K4me1 (enhancers); H3K27ac (enhancer/promoter activation); H3K27me3 (Polycomb repression); H3K36me3 and H4K20me1 (transcription); and H3K9me3 (heterochromatin).
H3K4me3 (associated primarily with active promoters); H3K4me1 (enhancers); H3K27ac (enhancer/promoter activation); H3K27me3 (Polycomb repression); H3K36me3 and H4K20me1 (transcription); and H3K9me3 (heterochromatin).
Thursday, 27 August 2015
Convert Genomic Coordinates from One Genome Version to Another
Tool: Converting Genome Coordinates From One Genome Version To Another (Ucsc Liftover, Ncbi Remap, Ensembl Api)
UCSC LiftOver Web Tools
pyliftover “only does conversion of point coordinates, that is, unlike liftOver, it does not convert ranges, nor does it provide any special facilities to work with BED files” (sourced from CrossMap).
UCSC LiftOver Web Tools
pyliftover “only does conversion of point coordinates, that is, unlike liftOver, it does not convert ranges, nor does it provide any special facilities to work with BED files” (sourced from CrossMap).
Monday, 24 August 2015
FeatureCounts: Strandedness
From the source code of featureCounts
"""
0: unstranded 1: stranded 2: reverse stranded
"""
strand_flag = {"unstranded": "0",
"firststrand": "2",
"secondstrand": "1"}
stranded = get_in(config, ("algorithm", "strandedness"),
"unstranded").lower()
"""
0: unstranded 1: stranded 2: reverse stranded
"""
strand_flag = {"unstranded": "0",
"firststrand": "2",
"secondstrand": "1"}
stranded = get_in(config, ("algorithm", "strandedness"),
"unstranded").lower()
Friday, 21 August 2015
Tuesday, 18 August 2015
XSLT: Output "&"
"&& \" produces " && \".
XSLT: Rule Execution Order
Cited from XSLT Tutorial - Basics
"""
You have to understand that XSLT works down "depth-first" the XML tree, i.e.
"By default the first one is applied. Since the XSLT processor only will apply one rule per element and also the most complex one."
"""
You have to understand that XSLT works down "depth-first" the XML tree, i.e.
- it first deals with the rule for the root element,
- then with the first instruction within this rule.
- If the first instruction says "find other rules" it will then apply the first rule found for the first child element and so forth...
- The rule of the root element is also the last one be finished (since it must deal step-by-step with everything that is found inside) !!!
"By default the first one is applied. Since the XSLT processor only will apply one rule per element and also the most complex one."
Monday, 17 August 2015
RNA-seq, ChIP-seq, ATAC-seq Paper
Chromatin state dynamics during blood formation
Tissue-Resident Macrophage Enhancer Landscapes Are Shaped
by the Local Microenvironment
Tissue-Resident Macrophage Enhancer Landscapes Are Shaped
by the Local Microenvironment
Sunday, 16 August 2015
Make: Empty Command
Cited from Commands
"Empty commands are most often used to prevent a pattern rule from matching the target and executing commands you don’t want."
"Empty commands are most often used to prevent a pattern rule from matching the target and executing commands you don’t want."
Make: Multiline Macro
Cited from Commands
"When a multiline macro is expanded, each line is inserted into the command script with a leading tab and make treats each line independently. The lines of the macro are not executed in a single subshell. So you willneed to pay attention to command-line continuation in macros as well."
"When a multiline macro is expanded, each line is inserted into the command script with a leading tab and make treats each line independently. The lines of the macro are not executed in a single subshell. So you willneed to pay attention to command-line continuation in macros as well."
Saturday, 15 August 2015
Differences Between Fork and Exec
Cited from Differences between exec and fork
"A process is an execution environment that consists of instruction, user-data, and system-data segments, as well as lots of other resources acquired at runtime, whereas a program is a file containing instructions and data that are used to initialize the instruction and user-data segments of a process."
"""
"A process is an execution environment that consists of instruction, user-data, and system-data segments, as well as lots of other resources acquired at runtime, whereas a program is a file containing instructions and data that are used to initialize the instruction and user-data segments of a process."
"""
- fork() creates a duplicate of the current process
- exec() replaces the program in the current process with another program
Archive File
Cited from Managing Modularity: Makefiles and Libraries
"When we have a collection of functions which often use, it is convenient to collect their compiled versions into a library archive file."
"When we have a collection of functions which often use, it is convenient to collect their compiled versions into a library archive file."
Friday, 14 August 2015
Make: Eval Function
Cited from Functions
"Using eval resolves the parsing issue because eval handles the multiline macro expansion and itself expands to zero lines."
"The argument to eval is expanded twice: once when when make pre-pares the argument list for eval, and once again by eval."
"Using eval resolves the parsing issue because eval handles the multiline macro expansion and itself expands to zero lines."
"The argument to eval is expanded twice: once when when make pre-pares the argument list for eval, and once again by eval."
Make: Export Multiple Target-specific Variables
all: export A=TEST
all: export B=OK
all:
@echo A is $$A
@echo B is $$B
all: export B=OK
all:
@echo A is $$A
@echo B is $$B
Thursday, 13 August 2015
Make: Environment Variables
Cited from Variables from the Environment
"Variables in make can come from the environment in which make is run. Every environment variable that make sees when it starts up is transformed into a make variable with the same name and value. However, an explicit assignment in the makefile, or with a command argument, overrides the environment. (If the ‘-e’ flag is specified, then values from the environment override assignments in the makefile. See Summary of Options. But this is not recommended practice.)"
=======================================
Cited from The Basics: Getting environment variables into GNU Make
"The override directive beats the command line which beats environment overrides (-e option) which beats macros defined in a Makefile file which beats the original environment."
"Variables in make can come from the environment in which make is run. Every environment variable that make sees when it starts up is transformed into a make variable with the same name and value. However, an explicit assignment in the makefile, or with a command argument, overrides the environment. (If the ‘-e’ flag is specified, then values from the environment override assignments in the makefile. See Summary of Options. But this is not recommended practice.)"
=======================================
Cited from The Basics: Getting environment variables into GNU Make
"The override directive beats the command line which beats environment overrides (-e option) which beats macros defined in a Makefile file which beats the original environment."
Bash: Multiple Commands in One Line
Cited from Which one is better: using ; or && to execute multiple commands in one line?
"
A; B = Run A and then B, regardless of success of A
A && B = Run B if A succeeded
A || B = Run B if A failed
A & = Run A in background.
"
"
A; B = Run A and then B, regardless of success of A
A && B = Run B if A succeeded
A || B = Run B if A failed
A & = Run A in background.
"
Wednesday, 12 August 2015
make -f-
Cited from make
"-f makefile
Use the description file makefile. If the pathname is the dash character (-), the standard input is used. If there are multiple instances of this option, they are processed in the order specified."
For example,
make -f- FOO=bar <<< 'goal:;@echo $(MAKECMDGOALS)'
====================================
Cited from Variables and Macros
'The stdin is redirected from a command-line string using bash's
here string, “<<<”, syntax.'
"-f makefile
Use the description file makefile. If the pathname is the dash character (-), the standard input is used. If there are multiple instances of this option, they are processed in the order specified."
For example,
make -f- FOO=bar <<< 'goal:;@echo $(MAKECMDGOALS)'
====================================
Cited from Variables and Macros
'The stdin is redirected from a command-line string using bash's
here string, “<<<”, syntax.'
Make: MAKEFILE_LIST
Cited from Variables and Macros
"A makefile can always determine its own name by examining the lastword of the list stored in the variable of MAKEFILE_LIST."
"A makefile can always determine its own name by examining the lastword of the list stored in the variable of MAKEFILE_LIST."
Make: Set Default Goal Using .DEFAULT_GOAL
Cited from Other Special Variables
"
"
.DEFAULT_GOAL:
Sets the default goal to be used if no targets were specified on the
command line. Note that assigning more than one target name to .DEFAULT_GOAL
is
invalid and will result in an error."
Make: Goal
Cited from Arguments to Specify the Goals
"The goals are the targets that
"By default, the goal is the first target in the makefile (not counting targets that start with a period). Therefore, makefiles are usually written so that the first target is for compiling the entire program or programs they describe. If the first rule in the makefile has several targets, only the first target in the rule becomes the default goal, not the whole list. You can manage the selection of the default goal from within your makefile using the
"You can also specify a different goal or goals with command line arguments to
"
"The goals are the targets that
make
should strive ultimately
to update. Other targets are updated as well if they appear as
prerequisites of goals, or prerequisites of prerequisites of goals, etc.""By default, the goal is the first target in the makefile (not counting targets that start with a period). Therefore, makefiles are usually written so that the first target is for compiling the entire program or programs they describe. If the first rule in the makefile has several targets, only the first target in the rule becomes the default goal, not the whole list. You can manage the selection of the default goal from within your makefile using the
.DEFAULT_GOAL
variable""You can also specify a different goal or goals with command line arguments to
make
. Use the name of the goal as an argument.
If you specify several goals, make
processes each of them in
turn, in the order you name them.""
Make
will set the special variable MAKECMDGOALS
to the
list of goals you specified on the command line."Make: Phony Targets
Cited from Phony Targets
"A phony target should not be a prerequisite of a real target file; if it is, its recipe will be run every time
For example, the phony target of "clean" is a not specified goal, and therefore not executed.
din:=/home/cornell/
.PHONY : listfile clean
listfile: $(din)
ls -lt $^
clean :
-rm ./test/test.txt
Phony targets can have prerequisites. For example, when the prerequisites are individual programs, the call to an overall phony target will cause the execution of individual programs.
For example, both rules of action1 and action2 will be executed.
d1:=/home/cornell/
d2:=/home/cornell/test
all: action1 action2
.PHONY : all
action1: $(d1)
ls -lt $^
action2: $(d2)
rm -rf $^
"A phony target should not be a prerequisite of a real target file; if it is, its recipe will be run every time
make
goes to update that
file. As long as a phony target is never a prerequisite of a real
target, the phony target recipe will be executed only when the phony
target is a specified goal."For example, the phony target of "clean" is a not specified goal, and therefore not executed.
din:=/home/cornell/
.PHONY : listfile clean
listfile: $(din)
ls -lt $^
clean :
-rm ./test/test.txt
Phony targets can have prerequisites. For example, when the prerequisites are individual programs, the call to an overall phony target will cause the execution of individual programs.
For example, both rules of action1 and action2 will be executed.
d1:=/home/cornell/
d2:=/home/cornell/test
all: action1 action2
.PHONY : all
action1: $(d1)
ls -lt $^
action2: $(d2)
rm -rf $^
Tuesday, 11 August 2015
Make: Patterm Matching Stem
Cited from How Patterns Match
"When the target pattern does not contain a slash (and it usually does not), directory names in the file names are removed from the file name before it is compared with the target prefix and suffix. After the comparison of the file name to the target pattern, the directory names, along with the slash that ends them, are added on to the prerequisite file names generated from the pattern rule’s prerequisite patterns and the file name. The directories are ignored only for the purpose of finding an implicit rule to use, not in the application of that rule. Thus, ‘e%t’ matches the file name src/eat, with ‘src/a’ as the stem. When prerequisites are turned into file names, the directories from the stem are added at the front, while the rest of the stem is substituted for the ‘%’. The stem ‘src/a’ with a prerequisite pattern ‘c%r’ gives the file name src/car."
"When the target pattern does not contain a slash (and it usually does not), directory names in the file names are removed from the file name before it is compared with the target prefix and suffix. After the comparison of the file name to the target pattern, the directory names, along with the slash that ends them, are added on to the prerequisite file names generated from the pattern rule’s prerequisite patterns and the file name. The directories are ignored only for the purpose of finding an implicit rule to use, not in the application of that rule. Thus, ‘e%t’ matches the file name src/eat, with ‘src/a’ as the stem. When prerequisites are turned into file names, the directories from the stem are added at the front, while the rest of the stem is substituted for the ‘%’. The stem ‘src/a’ with a prerequisite pattern ‘c%r’ gives the file name src/car."
Sunday, 9 August 2015
Friday, 31 July 2015
Notes from Mangaging Project with GNU Make
Cited from the book "Managing Projects with GNU Make"
"The target is the file or thing that must be made. The prerequisites or dependants are those files that must exist before the target can be successfully created. And the commands are those shell commands that will exist before the target can be successfucreate the target from the prerequisites."
"When make is asked to evaluate a rule, it begins by finding the files indicated by the prerequisites and target. If any of the prerequisites has an associated rule, make attempts to update those first. Next, the target file is considered. If any prerequisite is newer than the target, the target is remade by executing the commands."
" To update a line: different target (or to update more than one target) include the target name with make. such as make target"
" --just-print (or -n) tells make to display the commands it would execute for a particular target without actually executing them."
" To set almost any makefile variable on the command line to override the default value or the value set in the makefile. For example:
make mytarget FOO=BAR"
"If no prerequisites are listed to the right, then only the target(s) that do not exist are updated."
"Each command must begin with a tab character. This (obscure) syntax tells make that the characters that follow the tab are to be passed to a subshell for execution. If you accidentally insert a tab as the first character of a noncommand line, make will interpret the following text as a command under most circumstances."
=================================================
Explicit Rules:
"Pattern rules use wildcards instead of explicit filenames. get file matching the pattern needs to updated. Implicit rule."
"Implicit rules are either pattern rules or suffix built-in database of rules makes writing makefile."
"A variable is either a dollar sign followed by a single character or a dollar sign followed by a word in."
Wildcards:
"Make's wildcards are identical to the Bourne shell's: ~, *, ?, [...], and [^...]."
=================================================
the automatic variable $?: the set of prerequisites that are newer than the target.
$@: the name of the current target.
"You can look at make's default set of rules (and variables) by running make --print-data-base."
"The percent character can be placed anywhere within the pattern but can occur only once."
"The target is the file or thing that must be made. The prerequisites or dependants are those files that must exist before the target can be successfully created. And the commands are those shell commands that will exist before the target can be successfucreate the target from the prerequisites."
"When make is asked to evaluate a rule, it begins by finding the files indicated by the prerequisites and target. If any of the prerequisites has an associated rule, make attempts to update those first. Next, the target file is considered. If any prerequisite is newer than the target, the target is remade by executing the commands."
" To update a line: different target (or to update more than one target) include the target name with make. such as make target"
" --just-print (or -n) tells make to display the commands it would execute for a particular target without actually executing them."
" To set almost any makefile variable on the command line to override the default value or the value set in the makefile. For example:
make mytarget FOO=BAR"
"If no prerequisites are listed to the right, then only the target(s) that do not exist are updated."
"Each command must begin with a tab character. This (obscure) syntax tells make that the characters that follow the tab are to be passed to a subshell for execution. If you accidentally insert a tab as the first character of a noncommand line, make will interpret the following text as a command under most circumstances."
=================================================
Explicit Rules:
"Pattern rules use wildcards instead of explicit filenames. get file matching the pattern needs to updated. Implicit rule."
"Implicit rules are either pattern rules or suffix built-in database of rules makes writing makefile."
"A variable is either a dollar sign followed by a single character or a dollar sign followed by a word in."
Wildcards:
"Make's wildcards are identical to the Bourne shell's: ~, *, ?, [...], and [^...]."
=================================================
the automatic variable $?: the set of prerequisites that are newer than the target.
$@: the name of the current target.
"You can look at make's default set of rules (and variables) by running make --print-data-base."
"The percent character can be placed anywhere within the pattern but can occur only once."
XSLT Notes
Cited from the book "XSLT for Dummies"
"apply-templates doesn’t include the tags of the element—only what’s inside the tags"
"Namespaces were developed to avoid this name collision by linking a namespace identifier with a URI (Uniform Resource Identifier)."
"the primary purpose of xsl:copy is to carry over the element tags. However, if you combine it with xsl:apply-templates, you copy both the tags and its content"
"xsl:copy-of duplicates everything inside the current node. "
"The select attribute of the xsl:copy-of element determines what is copied to the result tree."
"<xsl:value-of select="expression"/>"
==================================================
matching element nodes
<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>
matching text and attribute nodes
<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>
matching processing instructions and comments
<xsl:template match="processing-instruction()|comment()"/>
"An XPath expression for matching a namespace node doesn’t exist."
==================================================
child axis: child:: or omitted by default.
attribute axis: attribute:: or @ by shorthand.
node() is a node test that matches any node whatever kind it is.
chapter[position()=1]
chapter[last()]
==================================================
"apply-templates doesn’t include the tags of the element—only what’s inside the tags"
"Namespaces were developed to avoid this name collision by linking a namespace identifier with a URI (Uniform Resource Identifier)."
"the primary purpose of xsl:copy is to carry over the element tags. However, if you combine it with xsl:apply-templates, you copy both the tags and its content"
"xsl:copy-of duplicates everything inside the current node. "
"The select attribute of the xsl:copy-of element determines what is copied to the result tree."
"<xsl:value-of select="expression"/>"
==================================================
matching element nodes
<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>
matching text and attribute nodes
<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>
matching processing instructions and comments
<xsl:template match="processing-instruction()|comment()"/>
"An XPath expression for matching a namespace node doesn’t exist."
==================================================
child axis: child:: or omitted by default.
attribute axis: attribute:: or @ by shorthand.
node() is a node test that matches any node whatever kind it is.
chapter[position()=1]
chapter[last()]
==================================================
Thursday, 30 July 2015
Wednesday, 29 July 2015
Monday, 27 July 2015
Sunday, 26 July 2015
Friday, 24 July 2015
Tuesday, 21 July 2015
Monday, 20 July 2015
Python: __slots__
Cited from Python __slots__
"The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation."
"The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation."
Python Underscore "_" in Method and Variable Names
Cited from Python: Why do some functions have underscores “__” before and after the function name?
One underline in the beginning:
"Python doesn't have real private methods, so one underline in the start of a method or attribute means you shouldn't access this method."
Two underlines in the beginning:
"So, when you create a method starting with __ it means that you don't want to anyone can override it, it will be accessible only from inside the own class."
Two underlines in the beginning and in the end:
"When we see a method like
One underline in the beginning:
"Python doesn't have real private methods, so one underline in the start of a method or attribute means you shouldn't access this method."
Two underlines in the beginning:
"So, when you create a method starting with __ it means that you don't want to anyone can override it, it will be accessible only from inside the own class."
Two underlines in the beginning and in the end:
"When we see a method like
__this__
, don't call it. Because it means it's a method which Python calls, not by you." Friday, 17 July 2015
Data Structure in Python
Cited from What Are Linear Structures?
"What distinguishes one linear structure from another is the way in which items are added and removed, in particular the location where these additions and removals occur."
=================================================
Cited from What is a Stack?
'A stack (sometimes called a “push-down stack”) is an ordered collection of items where the addition of new items and the removal of existing items always takes place at the same end. This end is commonly referred to as the “top.” The end opposite the top is known as the "base."'
'The base of the stack is significant since items stored in the stack that are closer to the base represent those that have been in the stack the longest. The most recently added item is the one that is in position to be removed first.'
=================================================
"What distinguishes one linear structure from another is the way in which items are added and removed, in particular the location where these additions and removals occur."
=================================================
Cited from What is a Stack?
'A stack (sometimes called a “push-down stack”) is an ordered collection of items where the addition of new items and the removal of existing items always takes place at the same end. This end is commonly referred to as the “top.” The end opposite the top is known as the "base."'
'The base of the stack is significant since items stored in the stack that are closer to the base represent those that have been in the stack the longest. The most recently added item is the one that is in position to be removed first.'
=================================================
Tuesday, 14 July 2015
Differentail Gene Expression of Kallisto
Cited from Sleuth companion tool
Import abundance.txt files into an R matrix or list or Bioconductor SummarizedExperiment, follow read kallisto RNA-seq quantification into R / Bioconductor data structures
Analysis using Limma, follow How to run kallisto on NCBI SRA RNA-Seq data for differential expression using the mac terminal
Import abundance.txt files into an R matrix or list or Bioconductor SummarizedExperiment, follow read kallisto RNA-seq quantification into R / Bioconductor data structures
Analysis using Limma, follow How to run kallisto on NCBI SRA RNA-Seq data for differential expression using the mac terminal
Download mm9 Transcriptome Fasta
Mouse mm9 transcriptome fasta file, rna.fa.gz, is downloadable from ftp://ftp.ncbi.nih.gov/genomes/M_musculus/ARCHIVE/BUILD.37.1/RNA/, according to the post Question: Reference Transcriptome From A Mouse
Monday, 13 July 2015
Friday, 10 July 2015
Configure Custom Connection Options for SSH Client
Cited from How To Configure Custom Connection Options for your SSH Client
"The only difference is that depending on the option and value, using the equal sign with no spaces can allow you to specify an option on the command line without quoting."
"The only difference is that depending on the option and value, using the equal sign with no spaces can allow you to specify an option on the command line without quoting."
Wednesday, 8 July 2015
Peak Score for Macs Peaks
Cited from Question: What Is The Meaning Of The Score In Diffbind'S Occupancy/Overlap Analysis?
"The default peak score for macs peaks is the "=-10*LOG10(pvalue)" value. It is normalised to a 0..1 scale by dividing the scores by the maximum score (so the max score gets a value of 1)."
"The default peak score for macs peaks is the "=-10*LOG10(pvalue)" value. It is normalised to a 0..1 scale by dividing the scores by the maximum score (so the max score gets a value of 1)."
Tuesday, 7 July 2015
Monday, 6 July 2015
Sunday, 5 July 2015
Friday, 3 July 2015
Aspera Connect Download and Install
aspera connect high-performance transfer plug-in can be downloaded from
http://downloads.asperasoft.com/connect2/
manual for installation can be found on http://download.asperasoft.com/download/docs/connect/3.6.0/user_linux/webhelp/index.html#dita/installation.html
http://downloads.asperasoft.com/connect2/
manual for installation can be found on http://download.asperasoft.com/download/docs/connect/3.6.0/user_linux/webhelp/index.html#dita/installation.html
Thursday, 2 July 2015
Wednesday, 1 July 2015
Tuesday, 30 June 2015
Sunday, 28 June 2015
Conversion Between Decimals and Bits
Decimal to bits
bin(10)
Bits to decimal
int("1011111",2)
bin(10)
Bits to decimal
int("1011111",2)
Friday, 26 June 2015
Thursday, 25 June 2015
Wednesday, 24 June 2015
Tuesday, 23 June 2015
SAM File 1-based Leftmost Mapping Position in Relation to Strand
Cited from Re: [Samtools-help] Questions about SAM format
"It's always the smaller of the two "end"-coordinates, on the positive strand (the strand that is given in your reference fasta). So, in a 100bp reference, if your 25bp read came from / is mapped to the negative strand right up against its 5'-end, the position in the SAM line would be 76. If you have another read that came from the positive strand right up against its 3'-end, the position in the SAM line would *also* be 76. Use the strand flag to distinguish between the two cases."
"It's always the smaller of the two "end"-coordinates, on the positive strand (the strand that is given in your reference fasta). So, in a 100bp reference, if your 25bp read came from / is mapped to the negative strand right up against its 5'-end, the position in the SAM line would be 76. If you have another read that came from the positive strand right up against its 3'-end, the position in the SAM line would *also* be 76. Use the strand flag to distinguish between the two cases."
Identify Ambiguously Mapped Reads in SAM/BAM
Cited from Wiki of PoPOOLationWalkthrough
"Filtering by a mapping qualiy of 20 removes the ambiguously mapped reads
samtools view -q 20 -b -S dmel.sam"
"Filtering by a mapping qualiy of 20 removes the ambiguously mapped reads
samtools view -q 20 -b -S dmel.sam"
Monday, 22 June 2015
Make Manual
Cited from 2.2 A Simple Makefile
When a target is a file, it needs to be recompiled or relinked if any of its prerequisites change.
Targets that do not refer to files but are just actions are called phony targets.
====================================================================
Cited from 2.3 How make Processes a Makefile
By default, make starts with the first target. This is called the default goal.
====================================================================
Cited from 2.6 Another Style of Makefile
When the objects of a makefile are created only by implicit rules, an alternative style of makefile is possible. In this style of makefile, you group entries by their prerequisites instead of by their targets.
====================================================================
Cited from 3.1 What Makefiles Contain
Makefiles contain five kinds of things: explicit rules, implicit rules, variable definitions, directives, and comments.
A directive is an instruction for make to do something special while reading the makefile. These include:
By default, when make looks for the makefile, it tries the following names, in order: GNUmakefile, makefile and Makefile.
The include directive tells make to suspend reading the current makefile and read one or more other makefiles before continuing. The directive is a line in the makefile that looks like this:
Conditional directives are parsed immediately. This means, for example, that automatic variables cannot be used in conditional directives, as automatic variables are not set until the recipe for that rule is invoked. If you need to use automatic variables in a conditional directive you must move the condition into the recipe and use shell conditional syntax instead.
If that special target is defined then in between the two phases mentioned above, right at the end of the read-in phase, all the prerequisites of the targets defined after the special target .SECONDEXPANSION are expanded a second time.
When a target is a file, it needs to be recompiled or relinked if any of its prerequisites change.
Targets that do not refer to files but are just actions are called phony targets.
====================================================================
Cited from 2.3 How make Processes a Makefile
By default, make starts with the first target. This is called the default goal.
====================================================================
Cited from 2.6 Another Style of Makefile
When the objects of a makefile are created only by implicit rules, an alternative style of makefile is possible. In this style of makefile, you group entries by their prerequisites instead of by their targets.
====================================================================
Cited from 3.1 What Makefiles Contain
Makefiles contain five kinds of things: explicit rules, implicit rules, variable definitions, directives, and comments.
An implicit rule says when and how to remake a class of files based on their names. It describes how a target may depend on a file with a name similar to the target and gives a recipe to create or update such a target.
- Reading another makefile
- Deciding (based on the values of variables) whether to use or ignore a part of the makefile (see Conditional Parts of Makefiles).
- Defining a variable from a verbatim string containing multiple lines (see Defining Multi-Line Variables).
====================================================================
Cited from 3.1.1 Splitting Long Lines
The way in which backslash/newline combinations are handled depends on whether the statement is a recipe line or a non-recipe line. Handling of backslash/newline in a recipe line is discussed later (see Splitting Recipe Lines).
Cited from 3.1.1 Splitting Long Lines
The way in which backslash/newline combinations are handled depends on whether the statement is a recipe line or a non-recipe line. Handling of backslash/newline in a recipe line is discussed later (see Splitting Recipe Lines).
====================================================================
Cited from 3.2 What Name to Give Your Makefile
By default, when make looks for the makefile, it tries the following names, in order: GNUmakefile, makefile and Makefile.
====================================================================
Cited from 3.3 Including Other Makefiles
Cited from 3.3 Including Other Makefiles
include filenames…
filenames can contain shell file name patterns. If filenames is empty, nothing is included and no error is printed. If the file names contain any variable or function references, they are expanded.
If the specified name does not start with a slash, and the file is not found in the current directory, several other directories are searched. First, any directories you have specified with the ‘-I’ or ‘--include-dir’ option are searched (see Summary of Options). Then the following directories (if they exist) are searched, in this order: prefix/include (normally /usr/local/include)/usr/gnu/include, /usr/local/include, /usr/include.
filenames can contain shell file name patterns. If filenames is empty, nothing is included and no error is printed. If the file names contain any variable or function references, they are expanded.
If you want
message, use the -include directive instead of include, like this:
-include filenames…
For compatibility with some other make implementations, sinclude is another name for -include.
====================================================================
Cited from 3.7 How make Reads a Makefile
make
to simply ignore a makefile which does not exist or cannot be remade, with no errormessage, use the -include directive instead of include, like this:
-include filenames…
For compatibility with some other make implementations, sinclude is another name for -include.
====================================================================
Cited from 3.7 How make Reads a Makefile
====================================================================
Cited from 3.8 Secondary ExpansionSunday, 21 June 2015
Creating Makefile from Json
In Python, aLib/webForm/json2make.py, and aLib package is described in aLib a sets of software tools to do basic analysis of Illumina sequencers
Json and jsvelocity utility for creating makefile, and separately, XML and XSL-based makefile creation pipeline are described in XML+XSLT = #Makefile -based #workflows for #bioinformatics
Json and jsvelocity utility for creating makefile, and separately, XML and XSL-based makefile creation pipeline are described in XML+XSLT = #Makefile -based #workflows for #bioinformatics
JSON Format
Excerpts from JSON Tutorial
"JSON is a syntax for storing and exchanging data.
JSON values can be:
"JSON is a syntax for storing and exchanging data.
JSON is language independent."
==============================================================
Excerpts from JSON Syntax
"JSON syntax is part of JavaScript syntax:
Excerpts from JSON Syntax
"JSON syntax is part of JavaScript syntax:
- Data is in name/value pairs
- Data is separated by commas
- Curly braces hold objects
- Square brackets hold arrays
JSON values can be:
- A number (integer or floating point)
- A string (in double quotes)
- A Boolean (true or false)
- An array (in square brackets)
- An object (in curly braces)
- null"
Saturday, 13 June 2015
Friday, 12 June 2015
Thursday, 11 June 2015
R biomaRt
Cited from The biomaRt user’s guide
- A first step is to check which BioMart web services are available. The function listMarts will display all available BioMart web services.
- The useMart function can now be used to connect to a specified BioMart database, this must be a valid name given by listMarts.
- BioMart databases can contain several datasets, for Ensembl every species is a different dataset. In a next step, the datasets are available in the selected BioMart database can be visualised by using the function listDatasets.
- To select a dataset we can up date the Mart object using the function useDataset. Or alternatively if the dataset one wants to use is known in advance, one can select a BioMart database and dataset in one step by useMart("database",dataset="dataset").
- The getBM function is the main query function in biomaRt. For some frequently used queries to Ensembl, wrapper functions are available: getGene and getSequence. biomaRt has four main arguments:
- attributes: is a vector of attributes that one wants to retrieve (= the output of the query).
- filters: is a vector of filters that one wil use as input to the query.
- values: a vector of values for the filters. In case multple filters are in use, the values argument requires a list of values where each position in the list corresponds to the position of the filters in the filters argument.
- mart: is and object of class Mart, which is created by the useMart function.
=============================================================
Monday, 8 June 2015
Sunday, 7 June 2015
Saturday, 6 June 2015
Thursday, 4 June 2015
R Color Cheatsheet
R color cheatsheet
Cheat Sheets for Plotting Symbols and Color Palettes (useful for color palette selection)
brewerpal
A guide to using colors in R
R Color Reference Sheet
Cheat Sheets for Plotting Symbols and Color Palettes (useful for color palette selection)
brewerpal
A guide to using colors in R
R Color Reference Sheet
Correlation Matrix Heatmap
ggplot2 : Quick correlation matrix heatmap
or
library(corrplot)
corrplot(corr, type="upper", order="hclust", tl.col="black", tl.srt=45)
or
library(corrplot)
corrplot(corr, type="upper", order="hclust", tl.col="black", tl.srt=45)
Sunday, 31 May 2015
Differences Between Ensembl, Gencode, RefSeq and UCSC
Cited from "Question: Difference Between Ensembl Databases In Ucsc Table Browser"
"Vega is a browser of the manually curated Havana gene set. Ensembl also perform automatic annotation of genes using protein and nucleotide sequence databases, such as EMBL, Uniprot and RefSeq. Ensembl use the GENCODE gene set, which is made up of the Havana and Ensembl automatic gene set. Genes within the GENCODE set are labelled as being either Ensembl (from the automatic annotation), Havana (from the manual annotation) or merged (exact match between the automatic and manual annotation)."
=======================================================================
"Vega is a browser of the manually curated Havana gene set. Ensembl also perform automatic annotation of genes using protein and nucleotide sequence databases, such as EMBL, Uniprot and RefSeq. Ensembl use the GENCODE gene set, which is made up of the Havana and Ensembl automatic gene set. Genes within the GENCODE set are labelled as being either Ensembl (from the automatic annotation), Havana (from the manual annotation) or merged (exact match between the automatic and manual annotation)."
=======================================================================
Friday, 22 May 2015
Wednesday, 20 May 2015
Monday, 18 May 2015
R EOF
Cited from R Command Line Processing
cat > printargs.R << EOF args = commandArgs() print(args) q() EOF
R --no-save < printargs.R
cat > printargs.R << EOF args = commandArgs() print(args) q() EOF
****************************************************************************
Sunday, 17 May 2015
CLIP-seq Analysis Example Papers
Transcriptome-wide identification of RNA binding sites by CLIP-seq
Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data
PIPE-CLIP: a comprehensive online tool for CLIP-seq data analysis
HITS-CLIP yields genome-wide insights into brain alternative RNA processing
Antagonistic regulation of mRNA expression and splicing by CELF and MBNL proteins
Mapping in vivo protein-RNA interactions at single-nucleotide resolution from HITS-CLIP data
PIPE-CLIP: a comprehensive online tool for CLIP-seq data analysis
HITS-CLIP yields genome-wide insights into brain alternative RNA processing
Antagonistic regulation of mRNA expression and splicing by CELF and MBNL proteins
Bash: Substring Removal
Cited from "How do I parse command line arguments in bash?"
To better understand ${i#*=} search for "Substring Removal" in this guide. It is functionally equivalent to `sed 's/[^=]*=//' <<< "$i"` which calls a needless subprocess or `echo "$i" | sed 's/[^=]*=//'` which calls two needless subprocesses.
*******************************************************************
To better understand ${i#*=} search for "Substring Removal" in this guide. It is functionally equivalent to `sed 's/[^=]*=//' <<< "$i"` which calls a needless subprocess or `echo "$i" | sed 's/[^=]*=//'` which calls two needless subprocesses.
*******************************************************************
Thursday, 14 May 2015
4C-seq Protocol and Primer Design
Robust 4C-seq data analysis to screen for regulatory DNA interactions
4C Primer Designer for 4C Viewpoints
4C-Seq primer database
Detecting long-range chromatin interactions using the chromosome conformation capture sequencing (4C-seq) method
The high-resolution 4C-seq method and the iterative correction procedure for Hi-C data.
4C Primer Designer for 4C Viewpoints
4C-Seq primer database
Detecting long-range chromatin interactions using the chromosome conformation capture sequencing (4C-seq) method
The high-resolution 4C-seq method and the iterative correction procedure for Hi-C data.
Tuesday, 5 May 2015
Message Passing
Cited from OO field guide
"With message-passing, messages (methods) are sent to objects and the object determines which function to call."
"With message-passing, messages (methods) are sent to objects and the object determines which function to call."
Monday, 4 May 2015
Function Names With Leading Dots
Cited from What does the dot mean in R – personal preference, naming convention or more?
"Function names with leading dots are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. "
"Function names with leading dots are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. "
Thursday, 30 April 2015
Complements and Intersections of VCF Files
Cited from bcftools
bcftools isec [OPTIONS] A.vcf.gz B.vcf.gz […]
Creates intersections, unions and complements of VCF files. Depending on the options, the program can output records from one (or more) files which have (or do not have) corresponding records with the same position in the other files.
Bash Dereference Concatenated Variable Name
Cited from Dereference concatenated variable name
FRUITS="BANANA APPLE ORANGE"
BANANA_COLOUR="Yellow"
APPLE_COLOUR="Green or Red"
ORANGE_COLOUR="Blue"
for fruit in $FRUITS ;do
eval echo $fruit is \$${fruit}_COLOUR
done
'The
FRUITS="BANANA APPLE ORANGE"
BANANA_COLOUR="Yellow"
APPLE_COLOUR="Green or Red"
ORANGE_COLOUR="Blue"
for fruit in $FRUITS ;do
eval echo $fruit is \$${fruit}_COLOUR
done
'The
eval
simply tells bash to make a second evaluation of the following statement (ie. one more that its normal evaluation).. The \$
survives the first evaluation as $
, and the next evaluation then treats this $
as the start of a variable name, which resolves to "Yellow", etc..'.
Wednesday, 29 April 2015
Retrive Genome File from UCSC Database
According to Downloading Data using MySQL,
For example,
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from mm9.chromInfo" > mm9.genome
For example,
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from mm9.chromInfo" > mm9.genome
Tuesday, 28 April 2015
Friday, 24 April 2015
Thursday, 23 April 2015
Wednesday, 22 April 2015
Tuesday, 21 April 2015
Regular Expression: The Order of Lookaheads
Cited from The Order of Lookaheads Doesn't Matter… Almost
"While the order of lookaheads doesn't matter on a logical level, keep in mind that it may matter for matching speed. If one lookahead is more likely to fail than the other two, it makes little sense to place it in third position and expend a lot of energy checking the first two conditions. Make it first, so that if we're going to fail, we fail early—an application of the design to fail principle from the regex style guide."
"The negative lookbehind (?<!.) asserts that what precedes the current position is not any character—therefore the position must be the beginning of the string."
"While the order of lookaheads doesn't matter on a logical level, keep in mind that it may matter for matching speed. If one lookahead is more likely to fail than the other two, it makes little sense to place it in third position and expend a lot of energy checking the first two conditions. Make it first, so that if we're going to fail, we fail early—an application of the design to fail principle from the regex style guide."
"The negative lookbehind (?<!.) asserts that what precedes the current position is not any character—therefore the position must be the beginning of the string."
Regular Expression: DOTALL mode
Cited from DOTALL (Dot Matches Line Breaks): s (with exceptions)
"By default, the dot . doesn't match line break characters such as line feeds and carriage returns. If you want patterns such as BEGIN .*? END to match across lines, we need to turn that feature on."
"This mode is sometimes called single-line (hence the s) because as far as the dot is concerned, it turns the whole string into one big line—.* will match from the first character to the last, no matter how many line breaks stand in between."
"In Perl, apart from the (?s) inline modifier, Perl lets you add the s flag after your pattern's closing delimiter. For instance, you can use:
if ($the_subject =~ m/BEGIN .*? END/s) { … }"
"By default, the dot . doesn't match line break characters such as line feeds and carriage returns. If you want patterns such as BEGIN .*? END to match across lines, we need to turn that feature on."
"This mode is sometimes called single-line (hence the s) because as far as the dot is concerned, it turns the whole string into one big line—.* will match from the first character to the last, no matter how many line breaks stand in between."
"In Perl, apart from the (?s) inline modifier, Perl lets you add the s flag after your pattern's closing delimiter. For instance, you can use:
if ($the_subject =~ m/BEGIN .*? END/s) { … }"
Regular Expression: Non-greedy Matching
Cited from Regular Expression Tutorial Part 5: Greedy and Non-Greedy Quantification
To make the quantifier non-greedy you simply follow it with a '?'
symbol:
my $string = 'bcdabdcbabcd';
$string =~ m/^(.*?)ab/;
To make the quantifier non-greedy you simply follow it with a '?'
symbol:
my $string = 'bcdabdcbabcd';
$string =~ m/^(.*?)ab/;
Regular Expression Possessive: Don't Give Up Characters
Cited from Possessive: Don't Give Up Characters
"As you'll see in the table below, a quantifier is made possessive by appending a + plus sign to it. Therefore, A++ is possessive—it matches as many characters as needed and never gives any of them back."
"As you'll see in the table below, a quantifier is made possessive by appending a + plus sign to it. Therefore, A++ is possessive—it matches as many characters as needed and never gives any of them back."
Monday, 20 April 2015
Regular Expression Anchors
"Regex anchors force the regex engine to start or end a match at an absolute position. The start of string anchor (\A) dictates that any match must start at the beginning of the string."
"The end of line string anchor (\Z) requires that a match end at the end of a line within the string."
"The word boundary anchor (\b) matches only at the boundary between a word character (\w) and a non-word character (\W)."
Cited from Regular Expressions and Matching
##################################################
✽ In .NET, Perl and Ruby, \Z is allowed to match before a final line feed. Therefore, e\Z will match the final e in the string "apple\norange\n".
Cited from Regex Anchors
"The end of line string anchor (\Z) requires that a match end at the end of a line within the string."
"The word boundary anchor (\b) matches only at the boundary between a word character (\w) and a non-word character (\W)."
Cited from Regular Expressions and Matching
##################################################
✽ In .NET, Perl and Ruby, \Z is allowed to match before a final line feed. Therefore, e\Z will match the final e in the string "apple\norange\n".
Cited from Regex Anchors
Regular Expression: The Use of (?
Named capture in Perl:
'Perl uses (?<NAME>pattern) to specify names captures. You have to use the %+ hash to retrieve them.
$variable =~ /(?<count>\d+)/;
print "Count is {count}";'
Cited from Can I use named groups in a Perl regex to get the results in a hash?
##################################################
"The normal capturing (pattern) has the property of capturing and group. Capturing means that the text matches the pattern inside will be captured so that you can use it with back-reference, in matching or replacement. The non-capturing group (?:pattern) doesn't have the capturing property."
"Atomic grouping (?>pattern) also has the non-capturing property, so the position of the text matched inside will not be captured."
Cited from Confusion with Atomic Grouping - how it differs from the Grouping in regular expression of Ruby?
'Perl uses (?<NAME>pattern) to specify names captures. You have to use the %+ hash to retrieve them.
$variable =~ /(?<count>\d+)/;
print "Count is {count}";'
Cited from Can I use named groups in a Perl regex to get the results in a hash?
##################################################
"The normal capturing (pattern) has the property of capturing and group. Capturing means that the text matches the pattern inside will be captured so that you can use it with back-reference, in matching or replacement. The non-capturing group (?:pattern) doesn't have the capturing property."
"Atomic grouping (?>pattern) also has the non-capturing property, so the position of the text matched inside will not be captured."
Cited from Confusion with Atomic Grouping - how it differs from the Grouping in regular expression of Ruby?
Tuesday, 14 April 2015
Paper "Charting a dynamic DNA methylation landscape of the human genome"
Excerpts from Charting a dynamic DNA methylation landscape of the human genome
"Most cell types, except germ cells and pre-implantation embryos3, 4, 5, display relatively stable DNA methylation patterns, with 70–80% of all CpGs being methylated."
"Most cell types, except germ cells and pre-implantation embryos3, 4, 5, display relatively stable DNA methylation patterns, with 70–80% of all CpGs being methylated."
CpG Island and Shores
Excerpts from "CpG site"
""CpG" is shorthand for "—C—phosphate—G—", that is, cytosine and guanine separated by only one phosphate; phosphate links any two nucleosides together in DNA. The "CpG" notation is used to distinguish this linear sequence from the CG base-pairing of cytosine and guanine."
#################################################
Excerpts from "Question: Find Cpg Islands"
Excerpts from "What is a CpG shore and how to I get them all?"
"CpG shores are the regions immediately flanking and up to 2 kbp away from CpG islands. These regions are interesting because methylation they are variably methylated in cancer and development."
""CpG" is shorthand for "—C—phosphate—G—", that is, cytosine and guanine separated by only one phosphate; phosphate links any two nucleosides together in DNA. The "CpG" notation is used to distinguish this linear sequence from the CG base-pairing of cytosine and guanine."
#################################################
Excerpts from "Question: Find Cpg Islands"
"CpG islands were predicted by searching the sequence one base at a time, scoring each dinucleotide (+17 for CG and -1 for others) and identifying maximally scoring segments. Each segment was then evaluated for the following criteria: GC content of 50% or greater, length greater than 200 bp, ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment.##################################################
The CpG count is the number of CG dinucleotides in the island. The Percentage CpG is the ratio of CpG nucleotide bases (twice the CpG count) to the length. The ratio of observed to expected CpG is calculated according to the formula (cited in Gardiner-Garden et al. (1987)): Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G) where N = length of sequence."
Excerpts from "What is a CpG shore and how to I get them all?"
"CpG shores are the regions immediately flanking and up to 2 kbp away from CpG islands. These regions are interesting because methylation they are variably methylated in cancer and development."
Monday, 13 April 2015
Paper "Targeted disruption of DNMT1, DNMT3A and DNMT3B in human embryonic stem cells"
Excerpts from "Targeted disruption of DNMT1, DNMT3A and DNMT3B in human embryonic stem cells"
"Human ESC methylation patterns are most unique at hypomethylated regulatory elements that are enriched for binding of pluripotency-associated master regulators, such as OCT4, SOX2 and NANOG."
"Human ESC methylation patterns are most unique at hypomethylated regulatory elements that are enriched for binding of pluripotency-associated master regulators, such as OCT4, SOX2 and NANOG."
Hemimethylated DNA
Exerpts from What is hemimethylated DNA?
"DNA-hemimethylation is when only one of two (complementary) strands is methylated. A hemi-methylated site is a single CpG that is methylated on one strand, but not on the other. This is not the same thing as allele-specific methylation, which is common in imprinting. In hemi-methylation, we’re talking about 2 strands from the same parent. Hemimethylation is important because it directly identifies de novo methylation events, allowing you to differentiation between de novo vs. maintenance factors. Because DNA methylation is faithfully propagated during DNA replication (by DNMT1), any hemimethylated sites must have arisen during the last replication round, either because: 1) failure to faithfully propagate a parental methylation signal; or, 2) a de novo methylation event. You can differentiate between the two if you know the methylation status of the parent: if the parent strand was entirely methylated, then hemimethylation indicates failure of maintenance. Vice versa, if the parent straned was unmethylated, hemimethylation indicates de novo methylation."
"DNA-hemimethylation is when only one of two (complementary) strands is methylated. A hemi-methylated site is a single CpG that is methylated on one strand, but not on the other. This is not the same thing as allele-specific methylation, which is common in imprinting. In hemi-methylation, we’re talking about 2 strands from the same parent. Hemimethylation is important because it directly identifies de novo methylation events, allowing you to differentiation between de novo vs. maintenance factors. Because DNA methylation is faithfully propagated during DNA replication (by DNMT1), any hemimethylated sites must have arisen during the last replication round, either because: 1) failure to faithfully propagate a parental methylation signal; or, 2) a de novo methylation event. You can differentiate between the two if you know the methylation status of the parent: if the parent strand was entirely methylated, then hemimethylation indicates failure of maintenance. Vice versa, if the parent straned was unmethylated, hemimethylation indicates de novo methylation."
Friday, 3 April 2015
TMM Normalisation
Excerpts from "NormalizationAndDifferentialExpression"
tmm <- calcNormFactors(geneCounts.dgelist)
# equation from the edgeR documentation for estimating normalized absolute expression from their scaling factors
tmmScaleFactors <- geneCounts.dgelist$samples$lib.size * tmm$samples$norm.factors
tmmExp <- round(t(t(tmm$counts)/tmmScaleFactors) * mean(tmmScaleFactors))
#################################################
Excerpts from "Question: After Getting Normalization Factor Via Edger, What To Do For Normalization?"
The TMM counts are: count / (library size * normalization factor)
Then multiply that by a million to get CPM.
Not count / normalization factor
And DESeq doesn't just do a simple division by library size. It takes the median of the ratio of the count to the geometric mean of the expression values as the scaling factor for each library.
tmm <- calcNormFactors(geneCounts.dgelist)
# equation from the edgeR documentation for estimating normalized absolute expression from their scaling factors
tmmScaleFactors <- geneCounts.dgelist$samples$lib.size * tmm$samples$norm.factors
tmmExp <- round(t(t(tmm$counts)/tmmScaleFactors) * mean(tmmScaleFactors))
#################################################
Excerpts from "Question: After Getting Normalization Factor Via Edger, What To Do For Normalization?"
The TMM counts are: count / (library size * normalization factor)
Then multiply that by a million to get CPM.
Not count / normalization factor
And DESeq doesn't just do a simple division by library size. It takes the median of the ratio of the count to the geometric mean of the expression values as the scaling factor for each library.
Monday, 30 March 2015
Friday, 27 March 2015
Thursday, 26 March 2015
Circos Plots: Tick Marks - Basics
Excerpts from Tick Marks, Grids and Labels
"Ticks, tick labels and grids are defined in the <ticks> block, which can contain any number of <tick> blocks, each defining ticks with a different spacing."
"Ticks refers to the radial lines that show progression of distance along the ideogram. Tick labels are the accompanying text elements that mark the position of the tick."
"The radius specifies the radial position of the tick marks, which you generally want to set to the outer ideogram radius."
"The label multiplier is the constant used to multiply the tick value to obtain the tick label. For example, if the multiplier is
"The orientation controls whether the ticks and labels face out (
"By referencing the position relative to the image, and not the ideogram, you decouple the position of the tick from the position of the ideogram. This absolute placement is useful if you know you want the ticks at a specific image position, regardless of the position of the ideograms. radius=dims(image,radius)-25p."
"Typically, one defines several sets of ticks by using <tick> blocks. Each set defines the display of ticks at a given spacing. For example, one could have three sets of ticks spaced at 1Mb, 5Mb and 10Mb, respectively, and formatted so that the 1Mb ticks are small and without labels whereas the 5Mb and 10Mb be larger and with labels. The 10Mb ticks might use a bolder font, for example, to give them greater visual weight."
"Unless force_display is set for a tick set, ticks at smaller spacing are not drawn at a position that already has another tick. In other words, the formatting of a tick mark is defined by the block associated with the spacing value that defines the largest divisor of the tick value."
"When tick size is expressed in relative terms, the comparator is the tickness of the ideogram. Therefore ticks with
"
When
"Ticks, tick labels and grids are defined in the <ticks> block, which can contain any number of <tick> blocks, each defining ticks with a different spacing."
"Ticks refers to the radial lines that show progression of distance along the ideogram. Tick labels are the accompanying text elements that mark the position of the tick."
"The radius specifies the radial position of the tick marks, which you generally want to set to the outer ideogram radius."
"The label multiplier is the constant used to multiply the tick value to obtain the tick label. For example, if the multiplier is
1e-6
, then the tick mark at position 10,000,000
will have a label of
10
. The multiplier is applied to the raw tick value, regardless of the
value of chromosomes_unit
.""The orientation controls whether the ticks and labels face out (
orientation=out
) or in (orientation=in
).""By referencing the position relative to the image, and not the ideogram, you decouple the position of the tick from the position of the ideogram. This absolute placement is useful if you know you want the ticks at a specific image position, regardless of the position of the ideograms. radius=dims(image,radius)-25p."
"Typically, one defines several sets of ticks by using <tick> blocks. Each set defines the display of ticks at a given spacing. For example, one could have three sets of ticks spaced at 1Mb, 5Mb and 10Mb, respectively, and formatted so that the 1Mb ticks are small and without labels whereas the 5Mb and 10Mb be larger and with labels. The 10Mb ticks might use a bolder font, for example, to give them greater visual weight."
"Unless force_display is set for a tick set, ticks at smaller spacing are not drawn at a position that already has another tick. In other words, the formatting of a tick mark is defined by the block associated with the spacing value that defines the largest divisor of the tick value."
"When tick size is expressed in relative terms, the comparator is the tickness of the ideogram. Therefore ticks with
size=0.1r
will have
a length that is 1/10th of the ideogram thickness. Tick thickness, on
the other hand, uses the tick size as the comparator. Thus, ticks with
thickness=0.1r
will have a width that is 1/10th the size of their
length. Similarly, if tick label size is defined relatively, it will
be scaled by tick size.""
When
chromosomes_display_default=yes
, you do not need to define
which ideograms ticks appear on because tick mark visibility is on by
default and you only need to define where tick marks are not shown. If chromosomes_display_default=no
, then things get a little bit
more complicated, because you now need to define where tick marks will
be shown and these definitions can contain regions of exclusion."Reporting Unwanted Sexual Behaviour in a Black Cab, Minicab, or on Public Transport in the UK
Quoted from an online source.
"If you would like to report any unwanted sexual behaviour in a black cab, minicab, or on public transport, please report it by calling 101 or texting 61016.
For further information or support please follow the links or call the numbers for the charities below.
Rape Crisis (England & Wales)
Website: www.rapecrisis.org.uk
Telephone Number: 08088029999
Victim Support
Website: https://www.victimsupport.org.uk/
Telephone Number: 08081689111
hollaback
Website: http://www.ihollaback.org/about/
In an emergency always call 999."
"If you would like to report any unwanted sexual behaviour in a black cab, minicab, or on public transport, please report it by calling 101 or texting 61016.
For further information or support please follow the links or call the numbers for the charities below.
Rape Crisis (England & Wales)
Website: www.rapecrisis.org.uk
Telephone Number: 08088029999
Victim Support
Website: https://www.victimsupport.org.uk/
Telephone Number: 08081689111
hollaback
Website: http://www.ihollaback.org/about/
In an emergency always call 999."
Sunday, 22 March 2015
Monday, 16 March 2015
Circos Plots
################################################
# chromosomes_units
Excerpts from "Drawing Ideograms"
"For example, chromosomes_units = 1000000 chromosomes = hs1:0-100;hs2:50-150;hs3:50-100;hs4;hs5;hs6;hs7;hs8
Will draw all 8 chromosomes, but only 0-100 Mb of hs1, 50-150Mb of hs2 and 50-100 Mb of hs3. The start and end ranges are given in units of chromosomes_units."
################################################
# karyotype file
Excerpts from "Karyotypes"
"The karyotype file defines the axes. In biological context, these are typically chromosomes, sequence contigs or clones.
Each axis (e.g. chromosome) is defined by unique identifier (referenced in data files), label (text tag for the ideogram seen in the image), size and color."
"Chromosome definitions are formatted as follows
chr - ID LABEL START END COLOR"
'The first two fields are always "chr", indicating that the line defines a chromosome, and "-". The second field defines the parent structure and is used only for band definitions.'
" Consider using the conventional chromosome color scheme as defined in the etc/color.conf configuration file. Colors are defined for each human chromosome and are named similiarly: chr1, chr2, ... chrx, chry, chrun. Colors must be in lowercase."
################################################
# external imports
Excerpts from "Configuration Files - Syntax, Colors, Fonts and Units"
"Two files should always be imported from etc/ in the Circos distribution. These are
# colors, fonts and fill patterns
<<include etc/colors_fonts_patterns.conf>>
# system and debug parameters
<<include etc/housekeeping.conf>>"
#################################################
# <image> block
Excerpts from "PNG Output"
"I suggest that you always import the default image settings.
<image>
# import defaults from Circos distribution
<<include etc/image.conf>>
</image>
The settings define the output file to be 3,000 x 3,000 pixels, with white background, named circos.png, which will be placed in the current directory."
"If you would like to overwrite any of these parameters, use the * suffix syntax.
# circos.conf
<image>
<<include etc/image.conf>>
file* = myfile.png
radius* = 1000p
</image>
"Output image directory and filename are defined in the dir and file parameters of the <image> block. The produced image is always square, and its size set by the radius parameter (this is the size of the inscribed circle). If radius=1500p, then the image will be 3,000 x 3,000 pixels in size."
#################################################
# Ticks & Labels
Excerpts from "Ticks & Labels"
'The radial position of the labels can be adjusted using label_radius. The quantity used as the reference for relative units depends on which parameter is defined. It is usually defined as the "parent container" of the element. For example, when definition ideogram position, the reference is image radius. When using track position, the reference is ideogram radius. As a result, when the parent element is moved (e.g. ideogram), all other elements move with it (e.g. data tracks).'
"Ticks are defined by group. You can have absolute or relatively spaced ticks, as well as ticks at specific positions. The primary parameter in each <tick> block is spacing. This defines the distance between adjacent ticks in this group. Typically, this value is defined in terms of chromosomes_units parameter — the suffix u is used for this — to keep the number legible. If a tick belongs to multiple groups, the group with largest spacing is prefered. Thus, the tick at 50 Mb will take its formatting from the spacing=25u group, not the spacing=5u group."
# chromosomes_units
Excerpts from "Drawing Ideograms"
"For example, chromosomes_units = 1000000 chromosomes = hs1:0-100;hs2:50-150;hs3:50-100;hs4;hs5;hs6;hs7;hs8
Will draw all 8 chromosomes, but only 0-100 Mb of hs1, 50-150Mb of hs2 and 50-100 Mb of hs3. The start and end ranges are given in units of chromosomes_units."
################################################
# karyotype file
Excerpts from "Karyotypes"
"The karyotype file defines the axes. In biological context, these are typically chromosomes, sequence contigs or clones.
Each axis (e.g. chromosome) is defined by unique identifier (referenced in data files), label (text tag for the ideogram seen in the image), size and color."
"Chromosome definitions are formatted as follows
chr - ID LABEL START END COLOR"
'The first two fields are always "chr", indicating that the line defines a chromosome, and "-". The second field defines the parent structure and is used only for band definitions.'
" Consider using the conventional chromosome color scheme as defined in the etc/color.conf configuration file. Colors are defined for each human chromosome and are named similiarly: chr1, chr2, ... chrx, chry, chrun. Colors must be in lowercase."
################################################
# external imports
Excerpts from "Configuration Files - Syntax, Colors, Fonts and Units"
"Two files should always be imported from etc/ in the Circos distribution. These are
# colors, fonts and fill patterns
<<include etc/colors_fonts_patterns.conf>>
# system and debug parameters
<<include etc/housekeeping.conf>>"
#################################################
# <image> block
Excerpts from "PNG Output"
"I suggest that you always import the default image settings.
<image>
# import defaults from Circos distribution
<<include etc/image.conf>>
</image>
The settings define the output file to be 3,000 x 3,000 pixels, with white background, named circos.png, which will be placed in the current directory."
"If you would like to overwrite any of these parameters, use the * suffix syntax.
# circos.conf
<image>
<<include etc/image.conf>>
file* = myfile.png
radius* = 1000p
</image>
"Output image directory and filename are defined in the dir and file parameters of the <image> block. The produced image is always square, and its size set by the radius parameter (this is the size of the inscribed circle). If radius=1500p, then the image will be 3,000 x 3,000 pixels in size."
#################################################
# Ticks & Labels
Excerpts from "Ticks & Labels"
'The radial position of the labels can be adjusted using label_radius. The quantity used as the reference for relative units depends on which parameter is defined. It is usually defined as the "parent container" of the element. For example, when definition ideogram position, the reference is image radius. When using track position, the reference is ideogram radius. As a result, when the parent element is moved (e.g. ideogram), all other elements move with it (e.g. data tracks).'
"Ticks are defined by group. You can have absolute or relatively spaced ticks, as well as ticks at specific positions. The primary parameter in each <tick> block is spacing. This defines the distance between adjacent ticks in this group. Typically, this value is defined in terms of chromosomes_units parameter — the suffix u is used for this — to keep the number legible. If a tick belongs to multiple groups, the group with largest spacing is prefered. Thus, the tick at 50 Mb will take its formatting from the spacing=25u group, not the spacing=5u group."
.bashrc and .bash_profile
Excerpts from "What is the purpose of .bashrc and how does it work?"
".bashrc is a shell script that Bash runs whenever it is started interactively. You can put any command in that file that you could type at the command prompt. You put commands here to set up the shell for use in your particular environment, or to customize things to your preferences."
"Contrast .bash_profile and .profile which are only run at the start of a new login shell. (bash -l) You choose whether a command goes in .bashrc vs .bash_profile depending on on whether you want it to run once or for every interactive shell start."
".bashrc is a shell script that Bash runs whenever it is started interactively. You can put any command in that file that you could type at the command prompt. You put commands here to set up the shell for use in your particular environment, or to customize things to your preferences."
"Contrast .bash_profile and .profile which are only run at the start of a new login shell. (bash -l) You choose whether a command goes in .bashrc vs .bash_profile depending on on whether you want it to run once or for every interactive shell start."
Sunday, 15 March 2015
Saturday, 14 March 2015
Friday, 13 March 2015
ENCODE Tier 1, Tier 2 and Tier 3 Cells
Excerpts from "ENCODE Cell Types 2007 - 2012"
"Tier1 cells are of higher priority, and should be used within experiments before Tier2 cells. Additional cell types beyond the designated Tier1 and Tier2 could be used for ENCODE production; these are selected at the discretion of individual data production groups, and are designated Tier3."
===============================================
Excerpts from "ENCODE Project Common Cell Types"
"These common cell types include both cell lines and primary cell types, and plans are being made to explore the use of primary tissues and embryonic stem (ES) cells.
Cell types were selected largely for practical reasons, including their wide availability, the ability to grow them easily, and their capacity to produce sufficient numbers of cells for use in all technologies being used by ENCODE investigators. Secondary considerations were the diversity in tissue source of the cells, germ layer lineage representation, the availability of existing data generated using the cell type, and coordination with other ongoing projects. Effort was also made to select at least some cell types that have a relatively normal karyotype."
Detailed descriptions of tier 1 and 2 cells were included in the link above.
"Tier1 cells are of higher priority, and should be used within experiments before Tier2 cells. Additional cell types beyond the designated Tier1 and Tier2 could be used for ENCODE production; these are selected at the discretion of individual data production groups, and are designated Tier3."
===============================================
Excerpts from "ENCODE Project Common Cell Types"
"These common cell types include both cell lines and primary cell types, and plans are being made to explore the use of primary tissues and embryonic stem (ES) cells.
Cell types were selected largely for practical reasons, including their wide availability, the ability to grow them easily, and their capacity to produce sufficient numbers of cells for use in all technologies being used by ENCODE investigators. Secondary considerations were the diversity in tissue source of the cells, germ layer lineage representation, the availability of existing data generated using the cell type, and coordination with other ongoing projects. Effort was also made to select at least some cell types that have a relatively normal karyotype."
Detailed descriptions of tier 1 and 2 cells were included in the link above.
PRO-seq
Excerpts from "Precise Maps of RNA Polymerase Reveal How Promoters Direct Initiation and Pausing"
"PRO-seq uses biotin-labeled ribonucleotide triphosphate analogs (biotin-NTP) for nuclear run-on reactions, allowing the efficient affinity purification of nascent RNAs for high throughput sequencing from their 3’ ends (Figs. 1A, S1A). Supplying only one of the four biotin-A/C/G/UTP restricts Pol II to incorporate a single or at most a few identical bases, resulting in sequence reads that have the same 3’ end base within each library (table S1). Moreover, the incorporation of the first biotin-base inhibits further transcript elongation, ensuring base-pair resolution (fig. S2)."
===============================================
Excerpts from "Genome-Wide Control of RNA Polymerase II Activity by Cohesin"
"PRO-seq varies from GRO-seq in that biotin-labeled ribonucleotides are used to allow run-on for a nucleotide or two, instead of the longer run-on with BrUTP used in GRO-seq. PRO-seq, like GRO-seq [17], is highly sensitive, and unlike ChIP, does not depend on crosslinking efficiency or antibody specificity, and detects elongation-competent Pol II regardless of the phosphorylation status. Nuclei were isolated under conditions of ribonucleotide depletion to halt transcription, but leave Pol II transcriptionally engaged. The nascent RNA transcripts produced upon restart of transcription were used to generate a cDNA library for high-throughput sequencing. Inclusion of sarkosyl in the run-on transcription reaction prevents new transcription initiation, so that only Pol II that is already transcriptionally engaged is detected, and gene body and promoter paused Pol II are detected with equal efficiency [17]"
"PRO-seq uses biotin-labeled ribonucleotide triphosphate analogs (biotin-NTP) for nuclear run-on reactions, allowing the efficient affinity purification of nascent RNAs for high throughput sequencing from their 3’ ends (Figs. 1A, S1A). Supplying only one of the four biotin-A/C/G/UTP restricts Pol II to incorporate a single or at most a few identical bases, resulting in sequence reads that have the same 3’ end base within each library (table S1). Moreover, the incorporation of the first biotin-base inhibits further transcript elongation, ensuring base-pair resolution (fig. S2)."
===============================================
Excerpts from "Genome-Wide Control of RNA Polymerase II Activity by Cohesin"
"PRO-seq varies from GRO-seq in that biotin-labeled ribonucleotides are used to allow run-on for a nucleotide or two, instead of the longer run-on with BrUTP used in GRO-seq. PRO-seq, like GRO-seq [17], is highly sensitive, and unlike ChIP, does not depend on crosslinking efficiency or antibody specificity, and detects elongation-competent Pol II regardless of the phosphorylation status. Nuclei were isolated under conditions of ribonucleotide depletion to halt transcription, but leave Pol II transcriptionally engaged. The nascent RNA transcripts produced upon restart of transcription were used to generate a cDNA library for high-throughput sequencing. Inclusion of sarkosyl in the run-on transcription reaction prevents new transcription initiation, so that only Pol II that is already transcriptionally engaged is detected, and gene body and promoter paused Pol II are detected with equal efficiency [17]"
Subscribe to:
Posts (Atom)