Monday 23 November 2015

DMNT3A and DMNT3B

  1. Addition of a methyl group to cytosine in the context of C-G dinucleotide
  2. DMNTS are associated with chromatin remodelling.
  3. 3A and 3B: de novo DNA methyltransferase
  4. Double KO lose differentiation capacity with passage.  

Thursday 19 November 2015

R plot function

Cited from How to Change Plot Options in R

"bty" is the plot function parameter that specifies the type of b round the plot area, use the option bty (box type):
  • "o": The default value draws a complete rectangle around the plot.
  • "n": Draws nothing around the plot.
=====================================
Cited from 15 Questions All R Users Have About Plots

Setting "xaxt" and "yaxt" parameter values equal to "n" removes the axis values of a plot. Any other character set for these arguments specifies the x-axis or y-axis values to be plotted.

Setting "ann" to "FALSE" removes the plotting of axes titles from plotting.

=====================================
Cited from Graphics with R

By default, the specified ranges of "xlim" and "ylim" are enlarged by 6%, so that values are not localised to the edges of a plot. In this case, "xaxs" and "yaxs" are set to the default value of "r" ("regular"). In contrary, setting "xaxs" and "yaxs" arguments to the character of "i" ("internal") specifies the limits at the edges of a plot.

Monday 16 November 2015

SCDE: Q&A

Cited from Definition of columns of scde diff DE matrix

These should provide estimates of the limma values.

logFC == mle

P.value == pnorm(Z)

adj.P.val == pnorm(cZ)

=====================================
Cited from Normalized read counts

scde.expression.magnitude returns FPM (not normalized by transcript length).

=====================================
p.self.fail=scde.failure.probability(models=o.ifm,counts=cd)




Saturday 14 November 2015

Mangaging Bash Processes

Cited from "Bioinformatics Data Skills Reproducible and Robust Research with Open Source Tools"

What is a shell process?

"When we run programs through the Unix shell, they become processes until they successfully finish or terminate with an error."

Background Processes

To run a program in the background, an ampersand (&) can be appended to the end of the command.

To check what processes have been running in the background, the command of "jobs" can be run.


Tee Command

Cited from "Bioinformatics Data Skills Reproducible and Robust Research with Open Source Tools"

"The Unix program tee diverts a copy of your pipeline’s standard output stream to an intermediate file while still passing it through its standard output."

For example,
program1 input.txt | tee intermediate-file.txt | program2 > results.txt









Pandoc Commands

Pandoc: a universal document converter

Markdown Syntax

Cited from Markdown Tutorial

Italic and Bold

To make a phrase italic in Markdown, the words can be surrounded by underscores ("_"), such as "_italic_".

To make phrases bold in Markdown, the words can be surrounded with two asterisks ("**"), such as "**bold**".

Headers

There are six types of headers, in decreasing sizes. The same number of hash marks before a header specified the size of the header in decreasing size.

 A header can not be made bold, but certain words can be italicized.

Links to Websites

To create an inline link, the link text is wrapped in brackets "[ ]", and then you wrap the link in parenthesis "( )". For example, "[Visit GitHub!](www.github.com)".

The reference link is a reference to another place in the document. For example, "[text][reference]" in the text, and at the bottom of the markdown document "[reference]:url".

Images

To create an inline image, "!(alt text)[url]".

Blockquote

A blockquote is a sentence or paragraph that's been specially formatted to draw attention to the reader.

To create a block quote, preface a paragraph or several paragraphs can be prefaced with the "greater than" caret (>).

Lists

To create an unordered list, each item in the list must be prefaced with an asterisk and space ("* "). Each item must be listed in its own line.

An ordered list is prefaced with numbers, instead of asterisks.

With a nested list, the sub-item must be indented one space more compared to the preceding item.

Graphs

If a new line was forcefully inserted, the togetherness may be broken. This would be the case of a hard break. Two spaces after each new line may be inserted to create a soft break. 




Thursday 12 November 2015

Data Type in Java

The set of values for each data type is known as the domain of that type.
Cited from "HKUSTx: COMP102.1x Introduction to Java Programming - Part 1"

Adding the keyword "final" to the declaration of a variable making the variable constant. For example, "final double bodyWeight;". A value can be assigned to a "final" variable once only.

Camel Case in Java

Cited from "HKUSTx COMP102.1x Introduction to Java Programming - Part 1"

Lower camelCase for names of variables and methods. For example, "double areaOfCircle".

Upper CamelCase for names of classes. For example, "public class HelloWorld".

 

Expectation-Maximization Algorithm Lecture Slides

Harvard Stat 211: Statistical Computing and Visualization

MIT OCW Machine Learning

JavaScript Library D3.js

Cited from Data-Driven Documents

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS.

Wednesday 11 November 2015

R bquote Function Explained

Cited from An R bquote example

When a plot is annotated with mathematics symbols in R, the use of expression may be required.

For example,
text(0, height[2], labels=expression(Y[med] ~ "=" ~ B*x^2), cex=3).

Contents surrounded by square ("[" and "]") brackets appear in subscript.

The tilde "~" operates as a separator, and does not show up in a plot.

If we wish to introduce variables in the annotation along with the mathematics symbols, the bquote function may be used.

For example,
text(0, height[i], labels=bquote(Y[.(z2[i])] ~ "=" ~ .(z1[i])*x^2), cex=3)

.(variable_name) retrieves the value stored in the variable and place the value inside the expression.


R ggplot2 Violin Plot

ggplot2 violin plot : Easy function for data visualization using ggplot2 and R software

Word Cloud Fundamentals in R

Text mining and word cloud fundamentals in R : 5 simple steps you should know

Friday 6 November 2015

Bayesian Inference Basics

Cited from the book "Statistical Rethinking: A Bayesian Course with Examples in R and Stan"

Maximum a posteriori (MAP) is the mode of the posterior distribution.

Binomial,Geometric,Hypergeometric,Poisson,NegB Distributions

Cited from the Youtube video  Overview of Some Discrete Probability Distributions (Binomial,Geometric,Hypergeometric,Poisson,NegB) 

Binomial, negative binomial and geometric distributions depends on the assumption of independent Bernoulli trials. 

Binomial distribution:
The number of trials is fixed, and the number of successes is the random variable.

Bernoulli distribution:
A special case of binomial distribution, the number of trail is fixed as 1, and the number of successes is the random variable.

Negative binomial distribution:
The number of successes is fixed, and the number of trial is the random variable.

Geometric distribution:
A special case of negative binomial distribution, the number of successes is fixed as 1, and the number of trials is the random variable.

Hypergeometric distribution depends on the assumption of non-independent trials. The drawing is without replacement from a source that contains a certain a certain number of successes and a certain number of failures.

Hypergeometric distribution:
Similar to binomial distributions, the number of trials is fixed, and the number of successes is the random variable.

If objects were sampled from a large population without replacement, the inter-dependence has a small effect. Then the binomial distribution closely approximates the hypergeometric distribution.

Poisson districtuion :
The Poisson distribution models the number of events (the random variable) in a given time, length, area or volume, etc, if these events occur randomly and independently.

The Poisson distribution approximates the Binomial distribution, when the number of trials (n) is large, and p the probability of successes is very small.



Zero-Inflated Models Explained

Do We Really Need Zero-Inflated Models?

Zero-In ated Poisson Regression An Introduction to ZIP Regression

Tuesday 3 November 2015

Differential ChIP-seq

ChIPComp: A novel statistical method for quantitative comparison of multiple ChIP-seq datasets

Tutorial of Downloading SRA Data with Aspera

Download SRA data with Aspera command line utility

BioMart Tutorial

Some basics of biomaRt

Bash Tutorial

Better Bash Scripting in 15 Minutes

The Accurate Estimation of RNA Concentration from RNA-Seq Data

Mix² – A software tool for the accurate estimation of RNA concentration from RNA-Seq data

Comparison of GENCODE and RefSeq Gene Annotation

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction

Variant and Pathogenicity

SSCM: A method to analyze and predict the pathogenicity of sequence variants

Transposon Quantification in RNA-seq

TEtranscripts – A package for including transposable elements in differential expression analysis of RNA-Seq datasets

RPKM, FPKM and TPM

RPKM, FPKM and TPM, clearly explained

Analysis of DNA Methylation MeRIP-seq Data Overview

FET-HMM – for spatially enhanced detection of differentially methylated region from MeRIP-Seq data

Hadoop Tutorial

Hadoop Tutorial For Beginners

Apache Spark

Beginners Guide: Apache Spark Machine Learning Scenario With A Large Input Dataset

Omics Analysis of Time Course Data

A Linear Mixed Model Spline Framework for Analysing Time Course ‘Omics’ Data

FunPat – function-based pattern analysis on RNA-seq time series data

Hi-C Analysis Review

Analysis methods for studying the 3D architecture of the genome