Friday 29 September 2017

R graphics::plot()

Cited from the Book "Beginning Data Science in R"

Plotting using the basic graphics usually follows this pattern. First, there is a call to plot() that sets up the canvas to plot on possibly adjusting the axes to make sure that later points will fit in on it. Then any additional data points are plotted. Finally, there might be some annotation like adding text labels or margin notes (see the text() and mtext() functions for this).

R plot() function with pipe operator

Cited from the Book "Beginning Data Science in R"

data(cars)

> cars %>% plot(speed, dist, data = .)
Error in plot(speed, dist, data = .) : object 'speed' not found

The following works.
 
cars %>% plot(dist ~ speed, data = .)

or

cars %$% plot(speed, dist, data = .)

Explanation:

The data argument of plot() is used when the variables of the plot are specified as a formula. It is  combined with a formula that the data parameter of the plot() function is used. 


R tidyr gather() Function

Cited from the Book "Beginning Data Science in R"

gather() transforms the data frame so that you get one column containing the name of your original columns and another column containing the values in those columns.

iris %>%
gather(key = Attribute, value = Measurement,
Sepal.Length, Sepal.Width)

This code tells gather() to make a column called Attributes that contains the names of columns from the input data frame and another called Measurement that will contain the values of the key columns. From the resulting data frame, you can see that the Attribute column contains the Sepal.Length and Sepal.Width names.



R Datasets

Cited from the Book "Beginning Data Science in R"

Distributed together with R is the package dataset. You can load the package into R using library(datasets) and get a list of the datasets in it, together with a short description of each, using library(help = "datasets").

To load an actual dataset into R’s memory, use the data(cars) function, for example.


R Literate Programming

Cited from the Book "Beginning Data Science in R"

The idea in literate programming is that the documentation of a program—in the sense of the documentation of how the program works and how algorithms and data structures in the program works—is written together with the code implementing the program.

Tools such as Javadoc and Roxygen (http://roxygen.org) do something similar.They have documentation of classes and methods written together with the code in the form of comments. Literate programming differs slightly from this. With Javadoc and Roxygen, the code is the primary document, and the documentation is comments added to it. With literate programming, the documentation is the primary text for humans to read and the code is part of this documentation, included where it falls naturally to have it. The computer code is extracted automatically from this document when the program runs.

But because the pipeline goes from R Markdown via knitr to Markdown and then via pandoc to the various output formats.

Thursday 28 September 2017

R Pipe

Cited from the Book "Beginning Data Science in R"

library(magrittr)

The %>% operator takes whatever is computed on the left side of it and inserts it as the first
argument to the function given on the right side, and it does this left to right.


If you are already providing parameters to a function in the pipeline, the left side of %>% is just inserted before those parameters in the pipeline.

Now, you cannot always be so lucky that all the functions you want to call in a pipeline take the left side of the %>% as its first parameter. If this is the case, you can still use the function, though, because "magrittr" interprets . in a special way. If you use . in a function call in a pipeline, then that is where the left side of the %>% operation goes instead of as default first parameter of the right side. So if you need the data to go as the second parameter, you put a . there, since x %>% f(y, .) is equivalent to f(y, x).

The magrittr package does more with . than just changing the order of parameters. You can use . more
than once when calling a function and you can use it in expressions or in function calls.


rnorm(4) %>% data.frame(x = ., is_negative = . < 0)

rnorm(4) %>% data.frame(x = ., y = abs(.))

There is one caveat: If . only appears in function calls, it will still be given as the first expression to the function on the right side of %>%

So by default, f(g(.),h(.)) gets translated into f(.,g(.),h(.)). If you want to avoid this behavior, you can put curly brackets around the function call, since {f(g(.),h(.))} is equivalent to f(g(.),h(.)).

While . is mainly used for providing parameters to functions in a pipeline, it can also be used as a short-hand for defining new functions.

. %>% f is equivalent to writing function(.) f(.)

f <- . %>% cos %>% sin is equivalent to f <- function(.) sin(cos(.))

"magrittr" has lambda expressions. This is a computer science term for anonymous functions, that is, functions that you do not give a name.
data.frame(x, y) %>% (function(d) {  plot(y ~ x, data = d)  abline(lm(y ~ x, data = d))
})


Using . and curly brackets, you can improve the readability (slightly) by just writing the body of the function and referring to the input of it—what was called d above—as '.'.

data.frame(x, y) %>% { 
   plot(y ~ x, data = .) abline(lm(y ~ x, data = .))
}


If you use the operator %$% instead of %>%, you can get to the variables just by naming them instead.

d <- data.frame(x = rnorm(10), y = 4 + rnorm(10))
d %>% {data.frame(mean_x = mean(.$x),
  mean_y = mean(.$y))}

or

d %$% data.frame(mean_x = mean(x), mean_y = mean(y))

%T>% (tee) operator works like the %>% operator but where %>% passes the result of the right side of the expression on, %T>% passes on the result of the left side. The right side is computed but not passed on.

d <- data.frame(x = rnorm(10), y = rnorm(10))
d %T>% plot(y ~ x, data = .) %>% lm(y ~ x, data = .)

The operator %<>% operator assigns the result of a pipeline back to a variable on the left.

d <- read_my_data("/path/to/data")
d %<>% clean_data

Equivalent to

d <- read_my_data("/path/to/data") %>% clean_data
 





 

 

 





 

 

 

 








R Missing Values

Cited from the Book "Beginning Data Science in R"

Operations that involve NA are themselves NA. You cannot operate on missing data and get anything
but more missing values in return. This also means that if you compare two NAs, you get NA. Because NA is missing information, it is not even equal to itself.

If you want to check if a value is missing, you must use the function is.na.

 

R data.frame

Cited from the Book "Beginning Data Science in R"

By default, a data frame will consider a character vector as a factor, and you need to tell it explicitly not to if you want a character vector.

df <- data.frame(a = 1:4, b = letters[1:4], stringsAsFactors = FALSE)

Functions for reading in data from various text formats will typically also convert string vectors to
factors, and you need to prevent this explicitly.
The readr package (see
https://github.com/hadley/readr) is a notable exception where the default is to treat character vectors as character vectors.
 

R factors

Cited from the Book "Beginning Data Sciences in R"

ff <- factor(c("small", "small", "medium",  "large", "small", "large"), levels = c("small", "medium", "large"))

ordered(f, levels = c("small", "medium", "large"))
## [1] small small medium large small large
## Levels: small < medium < large

 

A factor is actually not stored as strings, even though we create it from a vector of strings. It is stored as a vector of integers where the integers are indices into the levels.

The easiest way to deal with a factor as the actual labels it has is to translate it into a vector of strings.

R seq_along()

Cited from the Book "Beginning Data Sciences in R"

seq_along() function when given a vector as input, returns a vector of indices.

R If Eles Vectorized vs Non-Vectorized

Cited from the Book "Beginning Data Sciences in R"

You cannot use "if else" for vectorized expressions, and if you give the Boolean expression a vector, R program will evaluate the first element in the vector.

x <- 1:5
if (x > 3) "bar" else "baz"
## Warning in if (x > 3) "bar" else "baz": the
## condition has length > 1 and only the first
## element will be used
## [1] "baz


If you want a vectorized version of if statements, you can instead use the ifelse function:

x <- 1:5ifelse(x > 3, "bar", "baz")
## [1] "baz" "baz" "baz" "bar" "bar


maybe_square <- function(x) {
  if (x %% 2 == 0) {
    x ** 2

  } else {
    x
  }
}
maybe_square(1:5)
## Warning in if (x%%2 == 0) {: the condition has
## length > 1 and only the first element will be used
## [1] 1 2 3 4 5


maybe_square <- Vectorize(maybe_square)maybe_square(1:5)
 

 




What Is Infix Operators?

Cited from Data Infix Operators in R

Infix refers to the placement of the arithmetic operator between variables. For example, an infix operation is given by (a+b), whereas prefix and postfix operators are given by (+ab) and (ab+), respectively.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Cited from the Book "Beginning Data Science in R"

If you want help on an infix operator, you need to quote it, and you do that using backquotes.

?`+`


R Power Operations ^ or **

Cited from the Book "Beginning Data Science in R"

^ and ** are equivalent.

> 2^4
[1] 16
> 2**4
[1] 16

R %/% vs / Division Operators

Cited from the Book "Beginning Data Sciences in R"

> 4/3
[1] 1.333333
> 4%/%3
[1] 1

Gene Regulatory Nextwork from Single Cell RNA-seq

Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures

Monday 25 September 2017

Human Embryogenesis

Embryonic and Fetal Development

Difference Between Epithelial and Endothelial Cells

Difference Between Epithelial and Endothelial Cells

Lymphatic System

Clinical Overview of the Lymphatic System


Contact Inhibition

Cited from Cancer Cells in Culture

Normal cells: when placed on a tissue culture dish, they proliferate until the surface of the dish is covered by a single layer of cells just touching each other. Then mitosis ceases. This phenomenon is called contact inhibition.

Cancer cells show no contact inhibition. Once the surface of the dish is covered, the cells continue to divide, piling up into mounds.

Replication Senescence

Cited from Cancer Cells in Culture

Normal cells pass through a limited number of cell divisions (70 is about the limit for cells harvested from young animals) before they decline in vigor and die. This is called replicative senescence. It may be caused by their inability to synthesize telomerase.

Cancer cells in culture produce telomerase.

Cell Junctions

Cited from Junctions Between Cells

In many animal tissues (e.g., connective tissue), each cell is separated from the next by an extracellular coating or matrix.However, in some tissues (e.g., epithelia), the plasma membranes of adjacent cells are pressed together. Four kinds of junctions occur in vertebrates:
  1. Tight junctions
  2. Adherens junctions
  3. Gap junctions
  4. Desmosomes
In many plant tissues, it turns out that the plasma membrane of each cell is continuous with that of the adjacent cells. The membranes contact each other through openings in the cell wall called
  • Plasmodesmata
Tight Junctions

Epithelia are sheets of cells that provide the interface between masses of cells and a cavity or space (a lumen). The portion of the cell exposed to the lumen is called its apical surface. The rest of the cell (i.e., its sides and base) make up the basolateral surface.

Tight junctions seal adjacent epithelial cells in a narrow band just beneath their apical surface.

Tight junctions perform two vital functions:
  • They limit the passage of molecules and ions through the space between cells. So most materials must actually enter the cells (by diffusion or active transport) in order to pass through the tissue. This pathway provides tighter control over what substances are allowed through.
  • They block the movement of integral membrane proteins between the apical and basolateral surfaces of the cell. Thus the special functions of each surface, for example
    • receptor-mediated endocytosis at the apical surface
    • exocytosis at the basolateral surface
can be preserved.

Adherens Junctions
  1. Adherens junctions provide strong mechanical attachments between adjacent cells. 
  2.  They hold cardiac muscle cells tightly together as the heart expands and contracts. 
  3. They hold epithelial cells together. 
  4. They seem to be responsible for contact inhibition. 
  5. Some adherens junctions are present in narrow bands connecting adjacent cells. 
  6. Others are present in discrete patches holding the cells together.
Gap Junctions

Gap junctions are intercellular channels some 1.5–2 nm in diameter. These permit the free passage between the cells of ions and small molecules (up to a molecular weight of about 1000 daltons).

Desmosomes

Desmosomes are localized patches that hold two cells tightly together. They are common in epithelia (e.g., the skin). Desmosomes are attached to intermediate filaments of keratin in the cytoplasm.

Hemidesmosomes

These are similar to desmosomes but attach epithelial cells to the basal lamina ("basement membrane") instead of to each other.









Sunday 24 September 2017

Python Module, Package or Library

Cited from Modules vs Packages vs Libraries in Python

A module in python is a .py file that defines one or more function/classes which you intend to reuse in different codes of your program.

To reuse the functions of a given module you simply need to import the module using:

import <modulename> # to import the entire module

from <modulename> import <classname> # imports a class from a module

File path issues in R using Windows (“Hex digits in character string” error)

Cited from File path issues in R using Windows (“Hex digits in character string” error)

pathPrep <- function(path = "clipboard") {
    y <- if (path == "clipboard") {
        readClipboard()
    } else {
        cat("Please enter the path:\n\n")
        readline()
    }
    x <- chartr("\\", "/", y)
    writeClipboard(x)
    return(x)
}

Thursday 21 September 2017

DNA Methylation

DNA methylation is carried out by DNA methylation transferase (DNMT).

Three major types: DNMT1, DNMT3a, and DNMT3b.

Following fertilization, DNMT3a and DNMT3b are responsible for de novo methylation, allowing embryonic stem cells to differentiate into a cell type.

DNMT1 is responsible for maintenance of DNA methylation following differentiation, and is active during cell division thereafter.

In a normal adult cell, most CpG sites are methylated except in promoter CpG islands, and these CpG sites are typically unmethylated.

DNMT obtains the methyl from a molecule called sam, the methyl group is added to the cytosine forming 5 methyl cytosine.

DNMT flips cytosine 180 degrees out of bair pairing. So then the DNMT enzyme obtains the methyl group from sam and transfers it to the cytosine. Finally, the methylated cytosine is flipped back.

TET (ten eleven translocation) is responsible for adding a hydroxyl group initially to 5-methyl cytosine forming 5-hydroxymethyl cytosine. The TET enzyme is also able to convert 5-hydroxymethyl cytosine back to cytosine through cytosine through several pathways. Therefore, the TET enzymes are thought to be responsible for DNA demethylation.

In cancer cells, we see hyper-methylation of promoter CpG islands and this is associated with tumor suppressor gene inactivation. In contrast to the focal regions of hypermethylation, cancer DNA also undergoes widespread hypo-methylation across the entire genome. This bimodal deregulation of epigenetic landscape is found in every type of human tumor.

################################################
Cited from the paper "DNA methylation and healthy human aging"

The most common form of DNA methylation involves the addition of a methyl group to the 5′ cytosine of C‐G dinucleotides, referred to as CpGs. These nucleotide pairs are relatively sparse in the genome, and areas of comparatively high CpG density are referred to as CpG islands, identified as regions > 200 bp with a > 50% G+C content and 0.6 observed/expected ratio of CpGs (Saxonov et al., 2006; Illingworth & Bird, 2009). These islands tend to be less methylated compared to nonisland CpGs and are often associated with gene promoters, while the regions immediately surrounding CpG islands are referred to as ‘shores’, followed by ‘shelves’.

Approximately 60–70% of genes have a CpG island associated with their promoters, and promoters can be classified according to their CpG density (Saxonov et al., 2006; Weber et al., 2007).

Levels of DNA methylation at a promoter‐associated CpG island are generally negatively associated with gene expression, although some specific genes show the opposite effect (Weber et al., 2007; Lam et al., 2012; Gutierrez Arcelus et al., 2013). Interestingly, this negative correlation is not upheld when comparing expression and DNA methylation for a specific gene across individuals (van Eijk et al., 2012; Lam et al., 2012; Gutierrez Arcelus et al., 2013; Wagner et al., 2014). Conversely, DNA methylation in the gene body is often positively associated with levels of gene expression (Lister et al., 2009; Gutierrez Arcelus et al., 2013). DNA methylation also functions to repress repetitive elements, such as Alu and LINE‐1, which are generally highly methylated in the human genome. (Kochanek et al., 1993; Alves et al., 1996).





Tuesday 19 September 2017

Extract Cluster Membership and Order From ComplexHeatmap::Heatmap

library(ComplexHeatmap)
require(circlize)

#make data
#set random seed
mat = matrix(rnorm(80, 2), 8, 10)
mat = rbind(mat, matrix(rnorm(40, -2), 4, 10))
rownames(mat) = letters[1:12]
colnames(mat) = letters[1:10]

#make Heatmap object
obj1=Heatmap(mat,km=2)

#set random seed for plotting
set.seed(1234)
obj1

#let us see whether we can get row cluster. It is empty
obj1@row_order_list

#now let us do preparation
set.seed(1234)

#set the same random seed
obj2=prepare(obj1)

#Now we can get row cluster.
obj2@row_order_list

R Heatmap Color As in made4::heatplot Function

Color Scheme of made4::heatplot as in differences in heatmap/clustering defaults in R (heatplot versus heatmap.2)

Friday 15 September 2017

Java %b Format Specifier

Cited from the Book Java How to Program

The %b format specifier is used to display the word “true” or the word “false” based on a boolean expression’s value.

Java Boolean Logical Exclusive OR (^)

Cited from the Book Java How to Program

A simple condition containing the boolean logical exclusive OR (^) operator is true if and only if one of its operands is true and the other is false. If both are true or both are false, the entire condition is false.

Java Conditional And (&&)/OR (||) Operator vs. Boolean Logical And(&)/Logical Inclusive OR(I) Operator

Cited from the Book Java How to Program

The boolean logical AND (&) and boolean logical inclusive OR (|) operators are identical to the && and || operators, except that the & and | operators always evaluate both of their operands (i.e., they do not perform short-circuit evaluation).

This is useful if the right operand of the boolean logical AND or boolean logical inclusive OR operator has a required side effect—a modification of a variable’s value.

For example, the expression

( birthday == true ) | ( ++age >= 65 )

guarantees that the condition ++age >= 65 will be evaluated. Thus, the variable age is incremented, regardless of whether the overall expression is true or false.

Vi on the Move

All the right moves

Java Scanner Method 'hasNext'

Cited from the Book Java How to Program

Scanner method hasNext to determine whether there’s more data to input. This method returns the boolean value true if there’s more data; otherwise, it returns false.

Java Switch Statement

Cited from the Book Java How to Program

The switch multiple-selection statement performs different actions based on the possible values of a constant integral expression of type byte, short, int or char.

The switch’s controlling  expression compares this expression’s value (which must evaluate to an  integral value of type byte, char, short or int) with each case label. 

Listing cases consecutively in this manner with no statements between them enables the cases to perform the same set of statements.

For example,

        case 9:  // grade was between 90
         case 10: // and 100, inclusive
            ++aCount; // increment aCount
            break; // necessary to exit switch

The switch statement does not provide a mechanism for testing ranges of values, so every value you need to test must be listed in a separate case label. Each case can have multiple statements. The switch statement differs from other control statements in that it does not require braces around multiple statements in a case.

Without break statements, each time a match occurs in the switch, the statements for that case and subsequent cases execute until a break statement or the end of the switch is encountered. This is often referred to as “falling through” to the statements in subsequent cases.

The break statement is not required for the switch’s last case (or the optional default case, when it appears last), because execution continues with the next statement after the switch.

The expression in each case can also be a constant variable—a variable containing a value which does not change for the entire program. Such a variable is declared with keyword final. Java has a feature called enumerations. Enumeration constants can also be used in case labels.

As of Java SE 7, you can use Strings in a switch statement’s controlling expression and in case labels.

Java Do While Loop

Cited from the Book Java How to Program

The do…while statement tests the loop-continuation condition after executing the loop’s body; therefore, the body always executes at least once.

Java Monetary Calculations

Cited from the Book Java How to Program

Java also provides class java.math.BigDecimal to perform precise monetary calculations.

Java %,20.2f Explained

Cited from the Book Java How to Program

%,20.2f. The comma (,) formatting flag indicates that the floating-point value should be output with a grouping separator. The actual separator used is specific to the user’s locale (i.e., country). For example, in the United States, the number will be output using commas to separate every three digits and a decimal point to separate the fractional part of the number, as in 1,234.45. The number 20 in the format specification indicates that the value should be output right justified in a field width of 20 characters. The .2 specifies the formatted number’s precision—in this case, the number is rounded to the nearest hundredth and output with two digits to the right of the decimal point.

Java Math Class

Cited from the Book Java How to Program

Class Math is defined in package java.lang, so you do not need to import class Math to use it.

Java System.out object

Cited from the Book Java How to Program

System.out.printf: 'printf' a method call upon the System.out object.

Java Initialization and Increment Expressions

Cited from the Book Java How to Program

The initialization and increment expressions can be comma-separated lists that enable you to use multiple initialization expressions or multiple increment expressions. However, this is discouraged.

For example,

for ( int number = 2; number <= 20;
    total += number, number += 2 )
        ; // empty statement

Java Increment Expression in a 'for' Loop

Cited from the Book Java How to Program

The increment expression in a for acts as if it were a standalone statement after  its body executes. Therefore, the expressions

counter = counter +1
counter += 1
counter ++
++ counter

are equivalent increment expressions in a for statement.

Java Scope of a for Statement’s Control Variable

Cited from the Book Java How to Program

If the initialization expression in the for header declares the control variable (i.e., the control variable’s type is specified before the variable name, as in

 'for ( int counter = 1; counter <= 10; counter++ )' ,

the control variable (for example, 'counter') can be used only in that for statement—it will not exist outside it.

Thursday 14 September 2017

Java Class Inheritance

Cited from the Book Java How to Program

public class DrawPanel extends JPanel
{
  ...
}

The keyword extends represents a so-called inheritance relationship in which our new class DrawPanel begins with the existing members (data and methods) from class JPanel. 

In this inheritance relationship, JPanel is called the superclass and DrawPanel is called the subclass. This results in a DrawPanel class that has the attributes (data) and behaviors (methods) of class JPanel as well as the new features we’re adding in our DrawPanel class declaration.

Java Coordinate System

Cited from the Book Java How to Program

By default, the upper-left corner of a GUI component has the coordinates (0, 0).

The x-coordinate is the horizontal location moving from left to right. The y-coordinate is the vertical location moving from top to bottom.

Coordinates indicate where graphics should be displayed on a screen. Coordinate units are measured in pixels.

Prefix/Postfix Increment/Decrement Operator

Cited from the Book Java How to Program

An increment or decrement operatorthat’s prefixed to (placed before) a variable is referred to as the prefix increment or prefix decrement operator, respectively. An increment or decrement operator that’s postfixed to (placed after) a variable is referred to as the postfix increment or postfix decrement operator, respectively.

++a
    Increment a by 1, then use the new value of a in the expression in which a resides.
a++
    Use the current value of a in the expression in which a resides, then increment a by 1.
--b
    Decrement b by 1, then use the new value of b in the expression in which b resides.
b--
    Use the current value of b in the expression in which b resides, then decrement b by 1.

Attempting to use the increment or decrement operator on an expression other than one to which a value can be assigned is a syntax error. For example, writing ++(x + 1) is a syntax error, because (x + 1) is not a variable.

Java Data Type Casting

Cited from the Book Java How to Program

A cast operator is formed by placing parentheses around any type’s name. The operator is a unary operator (i.e., an operator that takes only one operand). Java also supports unary versions of the plus (+) and minus (–) operators, so you can write expressions like -7 or +5. Cast operators associate from right to left and have the same precedence as other unary operators, such as unary + and unary -. This precedence is one level higher than that of the multiplicative operators *, / and %.

Java Integer Division

Cited from the Book Java How to Program

Dividing two integers results in integer division—any fractional part of the calculation is lost (i.e., truncated).

To cast the results of integer division to floating number, use '(double)' for example.

average = (double) total / gradeCounter;

Java provides the unary cast operator to accomplish this task. The (double) cast operator—a unary operator - to create a temporary floating-point copy of its operand total (which appears to the right of the operator). Using a cast operator in this manner is called explicit conversion or type casting. The value stored in total is still an integer.

The calculation now consists of a floating-point value (the temporary double version of total) divided by the integer gradeCounter. Java knows how to evaluate only arithmetic expressions in which the operands’ types are identical. To ensure that the operands are of the same type, Java performs an operation called promotion (or implicit conversion) on selected operands. For example, in an expression containing values of the types int and double, the int values are promoted to double values for use in the expression. In this example, the value of gradeCounter is promoted to type double, then the floatingpoint division is performed and the result of the calculation is assigned to average. As long as the (double) cast operator is applied to any variable in the calculation, the calculation will yield a double result.

Wednesday 13 September 2017

Number of Clusters

Kmeans without knowing the number of clusters?

R S4 callNextMethod

Cited from the Book Advanved R

In S4, it’s the callNextMethod that (surprise!) is used to call the next method. It figures out which method to call by pretending the current method doesn’t exist, and looking for the next closest match.

callNextMethod is the most specific method that’s responsible for ensuring that the more generic methods are called.

R S4 Special Class: missing and ANY

Cited from the Book Advanced R

missing matches the case where the argument is not supplied, and ANY is used for setting up default methods. 

Monday 11 September 2017

Tuesday 5 September 2017

ERCC Spike-in Analysis

Using Synthetic Mouse Spike-In Transcripts to Evaluate RNA-Seq Analysis Tools

Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures

ERCC normalization (highly recommended)

The Overlooked Fact: Fundamental Need for Spike-In Control for Virtually All Genome-Wide Analyses (highly recommended)

Calibrating RNA-Seq Data with Drosophila Spike-ins

Cited from External calibration with Drosophila whole-cell spike-ins delivers absolute mRNA fold changes from human RNA-Seq and qPCR data

Calibrating RNA-Seq data with Drosophila spike-ins
  1. Concatenate the human and Drosophila genome (alternatively, concatenate the transcriptomes).
  2. Generate a genome (or transcriptome) index for your preferred alignment or counting algorithm if necessary.
  3. Align the reads to the concatenated genome/transcriptome.
  4. Generate read counts per gene/transcript/exon.
  5. Split the count table into two, one containing human genes, the other Drosophila genes. If you use DESeq2, you can also read in the complete count table and subset the resulting "DESeqDataSet" data object.
  6. Calculate sample scaling factors from the Drosophila read counts. When using DESeq2, use the function “sizeFactors” on the “DESeqDataSet” containing the Drosophila genes/transcripts/exons.
  7. Apply the size factors to the human read counts by dividing the counts of each individual gene of one sample by the respective sample size factor. When using DESeq2, overwrite the size factor slot of the human data object with the size factors estimated from the Drosophila data (pseudo-code: 'sizeFactors(HumanData) ← sizeFactors(DrosiData)').

Friday 1 September 2017

Java Convert String to Integer

Cited from the Book Java How to Program

The Integer class’s static method 'parseInt' takes a String argument representing an integer (e.g., the result of JOptionPane.showInputDialog) and returns the value as an int. Method parseInt is a static method of class Integer (from package java.lang). If the String does not contain a valid integer, the program will terminate with an error.

Java String.format vs System.out.printf

Cited from the Book Java How to Program

Static String method 'format' returns a String containing a greeting with the user’s name. Method format works like method System.out.printf, except that format returns the formatted String rather than displaying it in a command window.

Java Static Methods

Cited from the Book Java How to Program

Static method often define frequently used tasks.

A static method is called by using its class name followed by a dot (.) and the method name, as in ClassName.methodName( arguments ).

Notice that you do not create an object of class JOptionPane to use its static method. 

Java Dialog Box

Cited from the Book Java How to Program

Class JOptionPane from package javax.swing provides prebuilt dialog boxes that enable programs to display windows containing messages -- such windows are called message dialogs.

The method showMessageDialog of the JOptionPane class displays a dialog box containing a message. The method requires two arguments. The first helps the Java application determine where to position the dialog box. A dialog is typically displayed from a GUI application with its own window. The first argument refers to that window (known as the parent window) and causes the dialog to appear centered over the application’s window. If the first argument is null, the dialog box is displayed at the center of your screen. The second argument is the String to display in the dialog box.

The method showInputDialog of the JOptionPane class displays an input dialog containing a prompt and a field (known as a text field) in which the user can enter text. If you press the dialog’s Cancel button or press the Esc key, the method returns null and the program displays the word “null” as the name.


Java Format Specifier %f

Cited from the Book Java How to Program

The format specifier %f is used to output values of type float or double.

Java Float vs Double

Cited from the Book Java How to Program

Java provides two primitive types for storing floating-point numbers in memory—float and double. They differ primarily in that double variables can store numbers with larger magnitude and finer detail (i.e., more digits to the right of the decimal point—also known as the number’s precision) than float variables.

Variables of type float represent single-precision floating-point numbers and can represent up to seven significant digits. Variables of type double represent double-precision floatingpoint numbers. These require twice as much memory as float variables and provide 15 significant digits -- approximately double the precision of float variables.

Java Constructor

Cited from the Book Java How to Program

Each class you declare can provide a special method called a constructor that can be used to initialize an object of a class when the object is created. In fact, Java requires a constructor call for every object that’s created. 

Keyword new requests memory from the system to store an object, then calls the corresponding class’s constructor to initialize the object. The call is indicated by the parentheses after the class name. A constructor must have the same name as the class.

By default, the compiler provides a default constructor with no parameters in any class that does not explicitly include a constructor. When a class has only the default constructor, its instance
variables are initialized to their default values.

When you declare a class, you can provide your own constructor to specify custom initialization for objects of your class.

An important difference between constructors and methods is that constructors cannot return values, so they cannot specify a return type (not even void). Normally, constructors are declared public. If a class does not include a constructor, the class’s instance variables are initialized to their default values. If you declare any constructors for a class, the Java compiler will not create a default constructor for that class.

Java Scope, Local Variable vs Instance Variable

Cited from the Book Java How to Program

Variables declared in the body of a particular method are known as local variables and can be used only in that method. When that method terminates, the values of its local variables are lost.

An object has attributes that are carried with it as it’s used in a program. Such attributes exist before a method is called on an object, while the method is executing and after the method completes execution.

Attributes are represented as variables in a class declaration. Such variables are called fields and are declared inside a class declaration but outside the bodies of the class’s method declarations.

When each object of a class maintains its own copy of an attribute, the field that represents the attribute is also known as an instance variable—each object (instance) of the class has a separate instance of the variable in memory.

Unlike local variables, which are not automatically initialized, every field has a default initial value—a value provided by Java when you do not specify the field’s initial value. Thus, fields are not required to be explicitly initialized before they’re used in a program—unless they must be initialized to values other than their default values.

Primitive-type instance variables are initialized by default—variables of types byte, char, short, int, long, float and double are initialized to 0, and variables of type boolean are initialized to false.









Reference-type instance variables are initialized by default to the value null—a reserved word that represents a “reference to nothing.”  

Java Scanner Class Methods

Cited from the Book Java How to Program

Method nextLine reads characters typed by the user until it encounters the newline character, then returns a String containing the characters up to, but not including, the newline.

Class Scanner also provides a similar method -- next -- that reads individual words. When the user presses Enter after typing input, method next reads characters until it encounters a white-space character (such as a space, tab or newline), then returns a String containing the characters up to, but not including, the white-space character (which is discarded).

nextDouble method returns a double value entered by the user.

Java Static

Cited from the Book Java How to Program

A key part of enabling the JVM to locate and call method main to begin the application’s execution is the static keyword (line 7), which indicates that main is a static method. A static method is special, because you can call it without first creating an object of the class in which the method is declared.

Typically, you cannot call a method that belongs to another class until you create an object of that class.

Java java.lang

Cited from the Book Java How to Program

By default, package java.lang is imported in every Java program; thus, classes in java.lang are the only ones in the Java API that do not require an import declaration.

Java Data Type

Cited from the Book Java How to Program

Java’s types are divided into primitive types and reference types. The primitive types are boolean, byte, char, short, int, long, float and double. All nonprimitive types are reference types, so classes, which specify the types of objects, are reference types.

A primitive-type variable can store exactly one value of its declared type at a time. For example, an int variable can store one whole number (such as 7) at a time. When another value is assigned to that variable, its initial value is replaced. Primitive-type instance variables are initialized by default -- variables of types byte, char, short, int, long, float and double are initialized to 0, and variables of type boolean are initialized to false. An attempt to use an uninitialized local variable causes a compilation error.

Programs use variables of reference types (normally called references) to store the locations of objects in the computer’s memory. Such a variable is said to refer to an object in the program. Reference-type instance variables are initialized by default to the value null—a reserved word that represents a “reference to nothing.”

When you use an object of another class, a reference to the object is required to invoke (i.e., call) its methods. Primitive-type variables do not refer to objects, so such variables cannot be used to invoke methods.


Java Import Declaration

Cited from the Book Jave How to Program

All import declarations must appear before the first class declaration in the file. Placing an import declaration inside or after a class declaration is a syntax error.

Classes System and String are in package java.lang, which is implicitly imported into every Java program, so all programs can use that package’s classes without explicitly importing them.

Classes in the same package are implicitly imported into the source-code files of other classes in the same package.

import java.util.Scanner; // program uses Scanner
....
Scanner input = new Scanner( System.in );
...

java.util.Scanner specifies the full package name and class name. This is known as the class’s fully qualified class name.

Otherwise without the import declaration, it would be written as 

java.util.Scanner input = new java.util.Scanner( System.in );

Java API

Cited from the Book Java How to Program

Java comes with a rich set of predefined classes that are grouped into packages. Packages are named groups of related classes, and are collectively referred to as the Java class library, or the Java Application Programming Interface (Java API).

A String == A Character String == A String Literal

Cited from the Book Java How to Program

Java Acces Modifiers

Cited from the Book Java How to Program

The method declaration begins with keyword public to indicate that the method is “available to the public”—it can be called from methods of other classes.
Java class declarations normally contain one or more methods. For a Java application, one of the methods must be called main.

If an application has multiple classes that contain main, the one that’s invoked is the one in the class named in the java command.

Most instance-variable declarations are preceded with the keyword private (as in line 7). Like public, keyword private is an access modifier. Variables or methods declared with access modifier private are accessible only to methods of the class in which they’re declared.

Declaring instance variables with access modifier private is known as data hiding or information hiding.

Public Class and Source File Name

Cited from the Book Java How to Program

A public class must be placed in a file that has the same name as the class (in terms of both spelling and capitalization) plus the .java extension; otherwise, a compilation error occurs. For example, public class Welcome must be placed in a file named Welcome.java.

Each class declaration that begins with keyword public must be stored in a file having the same name as the class and ending with the .java file-name extension.

Runtime Error vs. Compilation Error

Cited from the Book Java How to Program

Errors such as division by zero occur as a program runs, so they’re called runtime errors or execution-time errors. Fatal runtime errors cause programs to terminate immediately without having successfully performed their jobs. Nonfatal runtime errors allow programs to run to completion, often producing incorrect results.

Forgetting one of the delimiters of a traditional or Javadoc comment is a syntax error. A syntax error occurs when the compiler encounters code that violates Java’s language rules (i.e., its syntax). These rules are similar to a natural language’s grammar rules specifying sentence structure. Syntax errors are also called compiler errors, compile-time errors or compilation errors, because the compiler detects them during the compilation phase. The compiler responds by issuing an error message and preventing your program from compiling. 

Javadoc

Cited from the Book Java How to Program

Java provides comments of a third type, Javadoc comments. These are delimited by /** and */. The compiler ignores all text between the delimiters. Javadoc comments enable you to embed program documentation directly in your programs. Such comments are the preferred Java documenting format in industry. The javadoc utility program (part of the Java SE Development Kit) reads Javadoc comments and uses them to prepare your program’s documentation in HTML format.

Java CLASSPATH

Cited from How to Set CLASSPATH for Java in Windows and Linux

Classpath in Java is the path to directory or list of the directory which is used by ClassLoaders to find and load class in Java program. Classpath can be specified using CLASSPATH environment variable which is case insensitive, -cp or -classpath command line option or Class-Path attribute in manifest.mf file inside JAR file in Java. CLASSPATH is an environment variable which is used by Java Virtual Machine to locate user defined classes. 

JAVA_HOME is another environment variable used to find java binaries located in JDK installation directory.

It's also worth noting that when you use the java -jar command line option to run your Java program as an executable JAR, then the CLASSPATH environment variable will be ignored, and also the -cp and -classpath switches will be ignored. In this case, you can set your Java classpath in the META-INF/MANIFEST.MF file by using the Class-Path attribute. In short Class-path attribute in manifest file overrides classpath specified by -cp, -classpath or CLASSPATH environment variable.