Shortcuts in Science: July 2014

Saturday 19 July 2014

Graduation Using Summation Formulae: Spencer 15-point rule

> spence.15
function (y)
{
    n <- length(y)
    y <- c(rep(y[1], 7), y, rep(y[n], 7))
    n <- length(y)
    k <- 3:(n - 2)
    a3 <- y[k - 1] + y[k] + y[k + 1]
    a2 <- y[k - 2] + y[k + 2]
    y1 <- y[k] + 3 * (a3 - a2)
    n <- length(y1)
    k <- 1:(n - 3)
    y2 <- y1[k] + y1[k + 1] + y1[k + 2] + y1[k + 3]
    n <- length(y2)
    k <- 1:(n - 3)
    y3 <- y2[k] + y2[k + 1] + y2[k + 2] + y2[k + 3]
    n <- length(y3)
    k <- 1:(n - 4)
    y4 <- y3[k] + y3[k + 1] + y3[k + 2] + y3[k + 3] + y3[k +
        4]
    y4/320
}
<environment: namespace:locfit>

Thursday 17 July 2014

Binary Search Array vs. Binary Search Tree

What is difference between Array and Binary search tree in efficiency?

Loop Invariant

Loops and Invariants

Theta, Oh and Omega Notations

n: the size of input.
Theta of n: indicate that a running time is bounded from above by some linear function of n, and from below by some (possibly different) linear function of n.
Big-oh of n: indicates that a running time is never worse than a constant times some function of n.
Big-omega of n: indicates that a running time is never better than a constant times some function of n.

Wednesday 16 July 2014

Interval Intersection Algorithms

Bedtools Intersect

Question: What Is The Quickest Algorithm For Range Overlap?

Question: Fast Interval Intersection Methodologies

efficient algorithm to intersect m ordered sets in memory?

Range overlap (with a lot of ranges)

Bedops: Memory Efficient Sort And Merge of Bed Files

The utility of Bedops, that is sort-bed --max-mem to perform a sort within system memory (for example, --max-mem 24G, which asks for 24 GB of your host's 32 GB of system memory) and bedops --merge to calculate merged elements from sorted data.

Cited from Question: Memory Efficient Bedtools Sort And Merge With Millions Of Entries?

Friday 11 July 2014

Bash: "Watch" Command

The watch Command

Bash: Floating Number Comparision

How to do float comparison in Bash?

Thursday 10 July 2014

Bash BC Calculator

Division in script and floating-point

Command line calculator, bc

Tuesday 8 July 2014

The find command matching multiple filename patterns

Linux find command - multiple filename patterns

Monday 7 July 2014

Perl: Import and Export

@EXPORT_OK
This array contains symbols that can be imported if they are specifically asked for.

In the module, for example,
@EXPORT_OK = qw (Op_Func %Table);

The user could load the module like so
use YourModule qw(Op_Func %Table F1);
# The F1 function was listed in the @EXPORT array. Notice that this does not automatically import F2 or @List, even though they're in the @EXPORT array. To get everything in @EXPORT plus extras from @EXPORT_OK, use the special :DEFAULT tag, such as:
use YourModule qw(:DEFAULT %Table);

%EXPORT_ TAGS
This hash is used by large modules like CGI or POSIX to create higher-level groupings of related import symbols. Its values are references to arrays of symbol names, all of which must be in either @EXPORT or @EXPORT_OK. Here's a sample initalization:
%EXPORT_TAGS=(
Functions=>[ qw (F1 F2 Op_Func) ],
Variables=>[ qw (@List %Table) ]
);

An import symbol with a leading colon means to import a whole group of symbols. Here's an example:
use YourModule qw(:Functions %Table);

That pulls in all symbols from:
@{ $YourModule::EXPORT_TAGS{Functions} } and then the %Table hash.

Thursday 3 July 2014

ggplot2: adjustment of plots

ggplot2: axis manipulation and themes

Wednesday 2 July 2014

R Package Namespaces

Cited from "Software for Data Analysis: Programming with R"

To apply the namespace mechanism, you must write a sequence of namespace directives in a file called "NAMESPACE" that resides in the top-level directory of your packages source. The directives look roughly like R expressions, but they are not evaluated by the R evaluator. Instead, the file is processed specially to defin the objects that our packages sees and the objects in our package that are seen by other software.

The namespace directives define two R environments, one for the objects that perform the computations inside the package and the other for the objects that users see when the package is attached in an R session. The first of these is referred to as the package's namespace. The second, the result of the export directives in the NAMSPACE file, is the environment attached in the search list.

When you access the two environments explicitly, they will print symbolically in a special form. For package SoDA, the environments would be <environment: namespace: SoDA> and <environment: package: SoDA>, respectivley.

The package's namespace contains all the objects generated by installing the package, that is, all the objects created by evaluating the R source in the package's R subdirectory.

The parent of the namespace is an environment containing all the objects defined by the import command in the NAMESPACE file.
The parent of that environment is the namespace of R's base package.

Using a NAMESPACE file, computations in the package will see the explicitly imported objects and the base package, in that order, regardless of what the packages are attached in the session.

Environment Variable in R

Cited from Environments

Every environment has a parent, another environment. Only one environment doesn’t have a parent: the empty environment.

It’s rare to talk about the children of an environment because there are no back links: given an environment we have no way to find its children.

Generally, an environment is similar to a list, with four important exceptions:

Every object in an environment has a unique name.
The objects in an environment are not ordered (i.e. it doesn’t make sense to ask what the first object in an environment is).
An environment has a parent.
Environments have reference semantics.

More technically, an environment is made up of two components, the frame, which contains the name-object bindings (and behaves much like a named list), and the parent environment. Unfortunately “frame” is used inconsistently in R. For example, parent.frame() doesn’t give you the parent frame of an environment, it gives you the calling environment.

There are four special environments:

The globalenv(), or global environment, is the interactive workspace. This is the environment in which you normally work. The parent of the global environment is the last package that you attached with library() or require().
The baseenv(), or base environment is the environment of the base package. Its parent is the empty environment.
The emptyenv(), or empty environment, is the ultimate ancestor of all environments, and the only environment without a parent.
The environment() is the current environment.

search() lists all parents of the global environment. This is called the search path because objects in these environments can be found from the top-level interactive workspace. It contains one environment for each attached package and any other objects that you’ve attach()ed. It also contains a special environment called Autoloads which is used to save memory by only loading package objects (like big datasets) when needed.

You can access any environment on the search list using as.environment().
For example, as.environment("package:stats").

To create an environment manually, use new.env(). You can list the bindings in the environment’s frame with ls() and see its parent with parent.env().

Another useful way to view an environment is ls.str(). It is more useful than str() because it shows each object in the environment. Like ls(), it also has an all.names argument.

Given a name, you can extract the value to which it is bound with $, [[, or get():

$ and [[ look only in one environment and return NULL if there is no binding associated with the name.
get() uses the regular scoping rules and throws an error if the binding is not found.

To compare enviroments, you must use identical() not ==.

Given a name, where() finds the environment where that name is defined, using R’s regular scoping rules.

The definition of where() is straightforward. It has two arguments: the name to look for (as a string), and the environment in which to start the search.

where <- function(name, env = parent.frame())
{
    if (identical(env, emptyenv()))
    {
        # Base case
        stop("Can't find ", name, call. = FALSE)
    } else if (exists(name, envir = env, inherits = FALSE)) {
        # Success case
        env
    } else {
        # Recursive case
        where(name, parent.env(env))
    }
}

The four types of environments associated with a function are enclosing, binding, execution, and calling.

The enclosing environment is the environment where the function was created. Every function has one and only one enclosing environment. For the three other types of environment, there may be 0, 1 or many environments associated with each function:

Binding a function to a name with <- defines a binding environment.
Calling a function creates an ephemeral execution environment that stores variables created during execution.
Every execution environment is associated with a calling environment, which tells you where the function was called.

The enclosing environment
When a function is created, it gains a reference to the environment where it was made. This is the enclosing environment and is used for lexical scoping. You can determine the enclosing environment of a function by calling environment() with a function as its first argument.

Tuesday 1 July 2014

Metaprogramming in R

Cited from Metaprogramming

quote() returns an expression: an object that represents an action that can be performed by R (Unfortunately expression() does not return an expression in this sense. Instead, it returns something more like a list of expressions. For example,
z <- quote(y <- x * 10)

str() describes names as symbols and calls as language objects. For example,
str(quote(a))
str(quote(a + b))

To create a new call from its components, you can use call() or as.call(). The first argument to call() is a string which gives a function name. The other arguments are expressions that represent the arguments of the call. For example,
call(":", 1, 10)
call("mean", quote(1:10), na.rm = TRUE)

as.call() is a minor variant of call() that takes a single list as input. The first element is a name or call. The subsequent elements are the arguments. For example,
as.call(list(quote(mean), quote(1:10)))

Many base R functions use the current call: the expression that caused the current function to be run. There are two ways to capture a current call:
sys.call() captures exactly what the user typed.
match.call() makes a call that only uses named arguments. It’s like automatically calling pryr::standardise_call() on the result of sys.call().

R package structure

Creating R Packages: A Tutorial

A generic function is a standard R function with a special body, usually containing only a call to UseMethod:
for example,
limod=function(x,...) UseMethod("linmod")

Shortcuts in Science

Thursday 31 July 2014

Format of SAM/BAM

GATK Analysis Piepeline Explained

Wednesday 30 July 2014

Mixture Model Parameter Estimation using Bayesian MCMC

EM algorithm and Mixture Models