Thursday 11 December 2014

C++: Preprocessor Directive

=Excerpt from "Sams Teach Yoursef C++ in One Hour a Day "=

A preprocessor is a tool that runs before the actual compilation starts. Preprocessor directives are commands to the preprocesor and always start with a pound sign #.


Monday 8 December 2014

Tuesday 25 November 2014

VCF File Info and Format Field Abbreviations

Format Field:

FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">

FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">


Info Field:
INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">

INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">

INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">

INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">

INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">

INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">

INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">

INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">

INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">

INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">

INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">

INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">

INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">

INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">

INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">

INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">

INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">


Monday 10 November 2014

MicroRNA Analysis Software

miRseqViewer – Multi-panel visualization of sequence, structure and expression for analysis of microRNA sequencing data

IsomiRage: From Functional Classification to Differential Expression of miRNA Isoforms

TruSeq Exome Targeted Regions

The Illumina website (http://support.illumina.com/sequencing/sequencing_kits/truseq_exome_enrichment_kit/downloads.html) provides the file "TruSeq Exome Targeted Regions BED file", and the explicit link is as the follow.   

http://support.illumina.com/content/dam/illumina-support/documents/myillumina/5dfd7e70-c4a5-405a-8131-33f683414fb7/truseq_exome_targeted_regions.hg19.bed.chr.gz

Stranded-RNA seq

Visualising stranded RNA-seq data with Gviz/Bioconductor

Perl: the Eval Function

Excerpts from Perl Eval Function Examples – Regex, Error Handling, Require, Timeout, Dynamic Code

Regular Expressions Handling with Eval
$line = <>;
%hash = ( number => qr/^[0-9]+$/,
                 alphabets => qr/^[a-zA-Z]+$/
);

while( my ($key,$value) = each(%hash) )
{
    if(eval "$line =~ /$value/") { print "$key\n"; }
}

Trapping Errors

During the execution of the subroutine the program might die because of errors, or external calling of die function. During this time, if the block of perl code is executed inside the eval, then program continues to run even after the die or errors, and it also captures the errors or dieing words.

Zero Divide Error:
eval { $average = $total / $count }; print “Error captured : $@\n”;











Sunday 9 November 2014

RNA-seq Quality Control Software

RSeQC: An RNA-seq Quality Control Package

Reverse engineering of molecular regulatory networks

birta: the R Bioconductor package

qpgraph: the R Bioconductor package

Interface to BioMart databases

biomaRt: the R Bioconductor package

Target Enrichment Quality Control

TEQC: the R Bioconductor package

Infer miRNA-mRNA interactions using paired expression data from a single sample

Roleswitch: the R Bioconductor package

R Interface to David

DAVIDQuery: the R Bioconductor package

RDAVIDWebService: the R Bioconductor package

R Circos plots

OmicCircos: the R Bioconductor package

Gene Regulatory Network Inference Using Time Series

GRENITS: the R Bioconductor package

TDARACNE: the R Bioconductor package

Inference of differential exon usage in RNA-Seq

DEXSeq: the R Bioconductor package

ChIP-seq QC package

ChIPQC: the R Bioconductor package

ChIP-seq: calculate read-enrichment scores for each nucleotide position

CSAR: the R Bioconductor package

Allelic Imbalance: RNA-seq Strand-specific Analysis

AllelicImbalance: the R Bioconductor package

quantitative trait loci WASP: allele-specific software for robust discovery of molecular

The PeakAnnotator Software the Overlap Data Sets (ODS) subroutine

PeakAnalyzer main functions

ChIPpeakAnno: the R Bioconductor package

TSS plot using RNA-seq and ChIP-seq data

ngsplot

metaseq examples

ChIPseeker: the R Bioconductor package

the CEAS software

Thursday 6 November 2014

abs() and fabs()

The function abs() takes an argument of type int and returns its absolute value as an int. Its function prototype is in stdlib.h.

fabs() taks an argument of type double and returns its absolute value as a double. Its function prototype is in math.h.


The Use of typedef

An excerpt from "A book on C"

The C language provides the typedef mechanism, which allows the programmer to explicitly associate a type with an identifier.

typedef char uppercase;

typedef int INCHES, FEET;

uppercase u;
INCHES length, width;

Assignment Operators

An excerpt from "A book on C".

The semantics is specified by

variable op= expression

which is equivalent to

variable=variable op (expression)


Assignment operators
=, +=, -=, *=, /=, %=, >>=,  <<=, &=, ^=, |=.





Increment and Decrement Operators

An excerpt form "A book on C".

The expression ++i causes the stored value of i to be incremented first, with the expression then taking as its value the new stored value of i.

In contrast, the expression i++ has as its value the current value of i; then the expression causes the stored value of i to be incremented.

Files

Excerpt from "A book on C"
# include <studio.h>

int main(int argc, char *argv[])
{
    .....

argc: argument count; its vlaue is the number of arguments in the command line that was used to execute the program.

argv: argument vector; it is an array of pointers to char.

Functions

Function prototypes:

double pow(double x, double y);
or equivalently,
double pow(double, double);

Identifiers such as x and y that occur in parameter type lists in function prototypes are not used by the compiler. Their purpose is to provide documentation to the programmer and other readers of the code.

In C, arguments to functions are always passed by value.
In C, to get the effect of call-by reference, pointers must be used.



scanf() and printf()

The function scanf() returns an int value that is the number of successful conversions accomplished or the system defined en-of-value.

The function printf() returns an int value that is the number of characters printed or a negative value in case of an error.


Wednesday 5 November 2014

Book: "Bioinformatics Sequence and Genome Analysis"

Affine gap penality is a gap penality score that is a linear function of gap length, consisting of a gap opening penality and a gap extension penality multiplied by the length of the gap.

Alignment score is a computed score based on the number of matches, substitutions, and insertions/deletesions (gaps) within an alignment. For DNA sequences, usually a match and mismatch score is chosen along with a gap penality that will produce the most reasonable alignment.

BLOSUM scoring matrices are commonly used to align protein sequences.

Convergent evolution refers to the evolution of two genes to the same biological function. However, because they have different starting points, the resulting sequences are not similar.

Distance score between aligned sequences is a measure of the evolutionary distance between the sequences.

Dynamic programming algorithm solves the problem of finding the optimal alignment between sequences by breaking the alignment down into a series of sequential sub-alignments that can be readily computed.

PAM scoring matrix, or percent accepted mutation scoring matrix, is a table or matrix that describes the odds that a sequence position, e.g., an amino acid, has changed into a second one during a period of evolutionary time.

Smith-Waterman algorithm is a dynamic programming algorithm for locating the highest-scoring local alignments of sequences. The key feature is that all negative scores calculated in the dynamic programming matrix are changed to zero to avoid extending poorly scoring alignments and to assist in identifying local alignments starting and stopping anywhere in the matrix.

In a local alignment, the alignment stops at the ends of regions of strong similarity, and a much higher priority is given to finding these local regions than to extending the alignment to include more neighboring amino acid pairs.

Odds ratio in the sequence alignment is the ratio of the odds o f obtaining the sore of related sequences to the odds of obtaining the same score between unrelated sequences. 


Tuesday 4 November 2014

Paper "Strand-Specific RNA-Seq Provides Greater Resolution of Transcriptome Profiling"

Strand-Specific RNA-Seq Provides Greater Resolution of Transcriptome Profiling

It seems that antisense tran-scriptional ‘hot spots’ are located around nucleosome-free regions such as those associated with promoters, indicating that it is likely that antisense transcripts carry out important regulatory functions.

Furthermore, antisense transcripts have been documented that partner with active promoter sites or those that are in close proximity of transcription start sites [17, 22, 23]. While antisense transcripts occur at lower abundances than their sense transcripts, all evidence points to non-coding antisense transcripts playing a pivotal role in regulation of the transcriptome [19].

There exist a variety of pathways in which antisense transcripts can act as regulatory elements. It is possible to divide these pathways into three broad categories; transcription modulation, hybridization of sense-antisense RNA partners and chromatin modification.

The act of antisense transcription, rather than asRNA molecule itself can modulate gene expression levels. During transcription RNA polymerase binds to the promoter region of the gene and proceeds along the strand. If transcription occurs on the DNA sense strand and antisense strand simultaneously it can result in the RNA polymerases colliding.

Splicing is controlled by the presence of exonic splicing enhancers/silencers and intronic enhancer/silencers, the ratios of these elements impact on the splicing pattern [27]. These elements contain motifs that will recruit splicing machinery to the site. If sections of the transcript containing these elements are masked, by hybridization with an antisense transcript, then the splicing patterns of the sense transcript will be changed.

RNA duplex formation in the cytoplasm may alter the ability of a transcript to be translated. It is possible that the duplex formation blocks the ability of the transcript to associate with the ribosome hence altering the efficiency of the translation machinery.

It has been suggested that long ncRNAs, such as those produced by antisense transcription, may interact with histone modifying enzymes via the formation of specific RNA secondary structures [36].

Monday 3 November 2014

Alignment Methods

Alignment Methods

Bowtie2

# specified parameters

--n-ceil L,0,0.03
L: linear
the maximum number of ambiguous characters allowed in a read as a function of read length; specifying -L,0,0.03 sets the N-ceiling function f to f(x) = 0 + 0.03* x, where x is the read length.

--score-min C,-14,0
C: constant
governing the minimum alignment score needed for an alignment to be considered "valid" (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function f to f(x) = -14 + 0 * 1, where x is the read length.

--phred33
Input qualities are ASCII chars equal to the Phred quality plus 33. This is also called the "Phred+33" encoding, which is used by the very latest Illumina pipelines.

-X 50000
The maximum fragment length for valid paired-end alignments. E.g. if -X 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as -I is also satisfied). A 61-bp gap would not be valid in that case. If trimming options -3 or -5 are also used, the -X constraint is applied with respect to the untrimmed mates, not the trimmed mates.

-N 1
Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.

-q/--quietbowtie2-build is verbose by default. With this option bowtie2-build will print only error messages.

# default parameters

--ma <int>
Sets the match bonus. In --local mode <int> is added to the alignment score for each position where a read character aligns to a reference character and the characters match. Not used in --end-to-end mode. Default: 2.

qalter

Torque:
qalter <jobid> -W queue=<new queue name>


Other non-specified job scheduler:
qalter -q <new queue name> <jobid>

Friday 31 October 2014

Plasmids

Plasmids:
  1. replicons
  2. stably inherited
  3. extrachromosomal
  4. double-stranded circular DNA (mostly)
  5. relaxed covalently closed circles (CCC DNA), open circles (OC DNA) or supercoiled DNA
  6. plasmids with unknown phenotypic traits are called cryptic plasmids
  7. conjugative or non-conjugative - depending upon whether or not they carry a set of transfer genes, called the tra genes, which promote bacterial conjugation. 
  8. multiple copies per cell (relaxed plasmids) or a limited number of copied per cell (stringent plasmids). 

Host range
  1. Encoding only a few of the proteins required for their own replication; all the other proteins required for replication are provided by the host cell. 
  2. Plasmid-encoded replication proteins are located very close to the ori (origin of replication).
  3. Other parts of the plasmid can be deleted and forein sequences can be added to the plasmid and the replication can occur.  
  4. The host range of a plasmid is determined by its ori region. Plasmids with a broad host range encode most, if not all, of the proteins required for replication.

Sunday 5 October 2014

GTAK: the Depreciation of Haloptype Scores in VQSR

"Please note that we no longer include HaplotypeScore in our recommendations for VQSR. This doe not mean you cannot use it, just that in our hands we have found it is not helpful." (Does callset derived from HyplotyperCaller need run through VariantAnnotator for VQSR step?)

Monday 22 September 2014

Reply to Reviewer Comments

Reply to Reviewer Comments (Example)

Papers to Read

Predicting the human epigenome from DNA motifs

SeqControl: process control for DNA sequencing

Accurate de novo and transmitted indel detection in exome-capture data using microassembly

Phen-Gen: combining phenotype and genotype to analyze rare disorders

Detecting and correcting systematic variation in large-scale RNA sequencing data

Normalization of RNA-seq data using factor analysis of control genes or samples

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

A community effort to assess and improve drug sensitivity prediction algorithms

A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data

A formal perturbation equation between genotype and phenotype determines the evolutionary action of protein coding variations on fitness

Decoding ChIP-seq with a double-binding signal refines binding peaks to single-nucleotides and predicts cooperative interaction

Genome-wide signals of positive selection in human evolution

Natural variation of histone modification and its impact on gene expression in the rat genome

Global translational reprogramming is a fundamental layer of immune regulation in plants

Architecture of the human interactome defines protein communities and disease networks

Haematopoietic stem and progenitor cells from humanpluripotent stem cells

Chromatin states define tumour-specific T cell dysfunction and reprogramming







Friday 19 September 2014

Interplay amaong different forces of evolution

Migration drift equilibrium

Mutation drift equilibrium: The equilibrium value of diversity is known as the population mutation parameter (theta).

Recombination drift equilibrium: As a consequence LD can reach an equilibrium value in finite populations. This equilibrium value of LD is determined by the population recombination parameter (rho). This parameter combines information on effective population size and recombination rate (c) using the equation: rho=4Ne*c. The precise relationship between measures of LD and rho is complex, but under certain conditions (r^2=1/1+rho). So that when rho is large, r^2=1/rho.

If we are examining LD over a large genomic region containing many polymorphic markers, it is unclear how best to combination the information from measures LD based on comparisons between individual pairs of markers (i.e., D, |D'| or r^2). Therefore recent attention has focused on estimating rho itself for these kinds of data, as this gives a single measure of LD for the entire region.

Mutation selection equilibrium:
Some deleterious mutational events have sufficiently high mutation rates that within a large population they occur several times within a single generation, and can be considered recurrent mutations.

The rate at which new mutations are generated can be balanced by the eventual elimination of each mutant by selection so that the average number of a given mutation reaches an equilibrium value within the population.

If we consider all deleterious alleles together, a balance between mutation and selection may operate over the genome as a whole, such that at equilibrium each genome contains a certain number of deleterious alleles.

A general rule for diploid loci is that for selection to be operating then the following relationship should hold:
s>1/2Ne
For haploid loci with one quarter the effective population size of diploid loci, the relevant rule is s>2/Ne.


Migration from the Evoluationary Perspective

Colonization is the process of movement into previously unoccupied land, thus entailing a founder effect. By contrast, migration is the movement from one occupied area to another. Gene flow is the outcome when a migrant contributes to the next generation in their new location.




Selection from the Evolutionary Perspective

Balancing selection: A selective regime that favours more than one allele and thus prevents the fixation of any allele.

Genetic Drift

Effective population size (Ne) measures the magitude of genetic drift: the smaller the effective population size, the greater the drift.

There are two ways of defining effective population sizes: one is based on the sampling variance of allele frequencies (i.e., how an allele's frequency might vary from one generation to the next), and the other utilizes the concept of inbreeding (i.e., the probability that the two alleles within an individual are identical by descent from a common ancestor).

Under most simple population size give identical values for Ne, in more complex situations this is not the case.

It is not easy to relate the effective population size (Ne) to the census size of a population (N), as there are many parameters that can affect his relationship, only some of which are relevant to humans.

With no favouring of either outcome, the fixation probability of an allele in the absence of selection is equal to its frequency of 1/2N.

The average time to fixation (t) in generations has been shown to be:
t=4Ne

The long-term effective population size has been shown to be approximately equal to the harmonic mean, rather than the arithmetic mean of the population sizes over time. The harmonic mean is the reciprocal of the mean of the reciprocals: (1/t) sum_{i=1}^{t}(1/Ni) for t generations.

Founder effects relate to the process of colonization and the genetic separation of a subset of the diversity present within the source population. In contrast, bottlenecks refer to the reduction in size of a single, previously large, population and a loss of prior diversity.

In general, the higher the reproductive variance, the lower the effective population size, because parental contributions become more and more unequal. It is worth nothing that when reproductive variance is less than expected under a Poisson distribution then Ne can be greater than N. 

Reproductive variance: The variance in number of offspring produced by a group of individuals.

Subpopulation: A randomly mating population that exchanges migrants with other populations to form a meta-population.

Meta-population: A group of populations connected by migration.

Fst: the fixation indices is a measure of the deviation of observed heterozygote frequencies from those expected under Hardy-Weinberg theorem. Fst compares the mean amount of genetic diversity found within subpopulations (Hs: the expected heterozygosity) to the genetic diversity of the meta-population (Ht).
Fst=(Ht-Hs) /Ht





 

Recombination: the evolutionary perspective

The benefits of recombination:

Recombination makes combinations of alleles across two or more loci that may be advantageous. Especially important with epistasis (interactions between loci) favouring a specific combination of alleles at the two loci.

Recombination helps get rid of bad mutations to create mutation-free offspring. 

========================================

Recombination is capable of breaking up advantageous allelic combinations. This results in the theoretical possibility that by increasing the likelihood of disrupting a beneficial haplotype, outbreeding can result in a drop in fitness known as outbreeding depression.

========================================

An allele that rises to high frequency through positive selection at a linked locus is said to be "hitchhiking". The reduction in diversity at loci linked to a recently fixed allele is dubbed as a selective sweep. Conversely, negative selection at a locus also reduces diversity at linked loci, albeit at a slow rate, by a processed known as background selection.

========================================

If we know the recombination rate per generation (r) between the newly mutated locus and a given locus, after a certain number of generations (t) we can track the decay of LD over time, by relating the present value of D (Dt) to the inital value of D (D0) using the equation:
Dt=(1-r)^t X D0


The Glossary of Evolution

Back mutation: A mutation from the derived state back to the ancestral state.

Recurrent mutation: A mutation that independently generates a derived state previously observed within the population. 

Thursday 18 September 2014

Publications of Selections on Non-coding Sequences

"The second hypothesis proposes that natural selection operates differently on mutations in cis-regulatory sequences6, 7, 8, 11. This hypothesis is based on two properties of the organization and function of cis-regulatory regions. First, allele-specific measures of transcript abundance indicate that each allele in a diploid organism is transcribed largely independently11, 12, 13, 14, suggesting that mutations in cis-regulatory regions are often co-dominant."

Natural selection operates far more efficiently on co-dominant mutations because they can have fitness consequences as heterozygotes: a new variant is visible to selection immediately rather than requiring drift to raise allele frequencies to the point at which homozygotes begin to appear in the population11.

Review: the evolutionary significance of cis-regulatory mutations

Purifying Selection in Deeply Conserved Human Enhancers Is More Consistent than in Coding Sequences

Genome-wide inference of natural selection on human transcription factor binding sites

The high degree of protein sequence similarity between phenotypically diverged species has led some to propose that regulatory evolution may be of considerably more importance than protein evolution4, 5.

Most of our direct knowledge regarding the evolution of regulatory elements comes from a handful of direct functional studies5, 6. A second, indirect approach is based on comparative genomics7. The rationale for this second approach is that if newly arising mutations are typically detrimental to gene function, functionally important parts of the genome are expected to evolve more slowly than those lacking function8, 9, 10, 11.

There are some limitations to the comparative genomics approach. First, a given genomic region might be conserved owing simply to a lower mutation rate12. Second, known regulatory elements do not seem to be particularly well conserved as a class, at least in Drosophila10. This finding suggests that taking an approach based on sequence conservation alone may lead to a biased view of regulatory evolution. Functionality of DNA sequences implies that they can be subject to both negative and positive selection. If a significant fraction of divergence between species observed in non-coding DNA is positively selected rather than selectively neutral or constrained, this could lead to underestimates of the functional importance of non-coding DNA and cause researchers to overlook the contribution of arguably the most interesting class of mutations in genome evolution—those reflecting adaptive differences between populations and species.

Adaptive evolution of non-coding DNA in Drosophila


Friday 29 August 2014

C Programming Static Variable

"""
  1. A static variable inside a function keeps its value between invocations.
  2. A static global variable or a function is "seen" only in the file it's declared in. 
"""
Cited from What does “static” mean in a C program?

C Programming Preprocessor

The C Preprocessor

Lines that begins  with a # are called preprocessing directives.

#define LIMIT 100
The identifier LIMIT is called the symbolic constant.

#include <stdio.h>
<> indicates that the file is to be found in the system path. The preprocessor looks for the file only in the system path and not in the current directory.

#include "my_file.h"
The quotes surrounding the name of the file are necessary. A search for the file is made first in the current directory and then in other system-dependent places.




C Programming Naming Conventions


"""The C++ standard library often uses an underscore as a word separator (e.g. out_of_range). Identifiers representing macros are, by convention, written using only upper case letters and underscores (this is related to the convention in many programming languages of using all-upper-case identifiers for constants). Names containing double underscore or beginning with an underscore and a capital letter are reserved for implementation (compiler, standard library) and should not be used (e.g. reserved__ or _Reserved)."""

Cited from Naming convention

C Programming Enumeration Data Type

C Programming Enumeration

Thursday 7 August 2014

Words for thoughts

stumbling blocks
pose a question
in the realm of statistics
in a twist on a similar theme
map nuclear pore architecture

an area of inquiry classically thought of as cell biology
the ability can be applauded
the raison d'être (the reason for existence) of a journal
disseminate developments across disciplines
the size of the modern scientific enterprise is vast

show a method in its best light
paradoxically
rapid developments
may put off potential users from taking the plunge for fear of investing in a new tool that will be obsolete
adopting new approaches

remain resolute in this approach to life
a foolhardy approach to things
the existing panoply of partially correlated and partially overlapping annotations
expanding these methods to genome-scale is arduous
the amounts of raw data produced are prodigious

these misaligned reads and inaccurate quality scores propagate into SNP discovery and genotyping
With the Herculean task of genome sequencing accomplished
some heat shock genes defy the current dogma of transcriptional regulation
expedite analyses

A large number of custom values can quickly make the file opaque to quick inspection
a welter of various genes
Nicola Sturgeon warns scaremongering 'remain' campaigners will squander support
a plethora of Git commands 
bloat the repository

the induction of phenotypic, expression or functional characteristics emblematic of the target cell type
heuristics: a way of ​solving ​problems by ​discovering things yourself and ​learning from ​your own ​experiences
they compare their predictions to cocktails that have previously been reported to enact cell fate changes
generally regulatory networks of multicellular organisms are plagued by both low sensitivity and high false positive rates

Future work will be required to develop new means of overcoming these remaining fundamental barriers to get engineered cells across the finish line.
Pluripotent stem cells can be coaxed to specific lineages through a combination of defined growth conditions and ectopic gene expression.
lack quantitative rigor
the fidelity of cell fate conversions
She has an ​encyclopedic ​knowledge of ​sports trivia.  

The ​museum ​celebrates the trivia of ​everyday ​life.
CellNet is predicated on the discovery of GRNs
Expression repositories have accumulated a wide array of biological perturbations
the performance degrades substantially.

relationships that can be obscured when
merging gene expression profiles from disparate experimental contexts yields high quality biological models
an insoluble amalgam of proteins
I am bullish on the company’s ecommerce segment with a long-term horizon

UK economy stutters in first quarter
Bankruptcy Is Bellwether of New York’s Condo Market
With lenders risk averse 
Attempts by the European Union to stimulate innovation are stifled by bureaucracy.
Britain's property market is on the cusp of a crash.

China looks to rice cookers to pep up economy
ended up hobbling the once highflying economy
Cells become refractory to reprogramming
concurs with our findings
results in bifurcation of a single TAD into two distinct smaller TADs

anomalies
a pronounced shift
the embryo still encased in zona pellucida
attain the stage of independent feeding
obviate the need for

alleviating the need
The reactivation of these genes therefore appears not sufficient to endow cells with development pluripotency.
traditional neural networks require an exorbitant amount of training data
data acquisition technologies
Mention in New York Times clinches sale of Newport home

conferring pluripotency
pluripotent state dissolution
sustenance of this internal regulatory network
MACS compares favorably to existing ChIP-Seq peak-finding algorithms
potentially superior alternative to

poses challenges (or opportunities) in the analysis of data
gives results superior to 
in excess of what is warranted by the sequencing depth
removes all the redundancies
two discernible stages

which the authors dub initiation and maturation
underscores the power of
mechanisms that normally safeguard cell identity
A case in point is the finding that
insurmountable barriers

cells become quiescent
do not join the iPS craze 
premature to coin different terms for processes that we only understand incompletely
What questions will fuel stem cell science over the coming decades?
This conclusion dovetails with parallel work on

pre-RC proteins are assembled during G1 phase and evicted during S phase
provide an ensemble view of
there are two leading strands emanating bidirectionally from origins
making contamination a lingering concern
What has been most disconcerting is that

It should not be insurmountable to
is daunting
must transcend the basic need to
most synchronization schemes are cumbersome
these uncertainties in the gene regulatory milieu

due to incongruencies in their temporal coverage of transcriptional measurements
Is the mitosis–G1 transcriptional spike buffered by post-transcriptional regulation
This issue plagues at least Excel 2008 and 2011
I am sheepish to admitThis is the orphan disease of line ending problems.

That solution does absolutely zilch if you are suffering from naked carriage returns as line endings.
has received high acclaim
the deluge of reads 
since inception in 1994
a factor or histone modification directly bestows a bookmarking function  on an associated gene

Mice lacking GATA1 succumb to anemia due to the failure
to produce mature viable erythroid cells
a significant step in resolving this conundrum
consider the overarching implications
As an orthogonal  approach to
A corollary of the second prediction

Sox2 protein-protein interactions dependent on transcriptional activation  are severely abrogated in mitosis
have a blast on the business side of science
to top it off, this person had to have a PhD
how to handle the backwoods of biotechnology
biotechnoogy companies haven't proven to be flexible about the turf in which they blossom

venting pent-up frustration 
companies will solicit your resume
potential job leads
tap contacts
shelve the book

many Ph.D.s must grovel for academic jobs 
others describe their discomfort with 'schmoozing' with senior professionals in their field
the stories about the academic job search are downbeat
the recruiting and hiring practices of most academic departments seem callous
newly minted college graduates

keep the door open just a crack
riding the next wave into
he is surfing toward tenure
sb's journey from there to the questions that xxx is sprinkled with crumbs of inspiration and a healthy measure of doubt
I managed to glean that much from them

traversed a similar path
splintered job market
unbridled enthusiasm
last winter was a reprieve from shoveling and high fuel bills
Oct4 is perhaps the only obligatory and general regulator of pluripotency

their activity must be promptly reinstated after each  division
preserve their transcriptional status
we report the compendium of genomic locations bound by Esrrb during mitosis
monitored their mitotic behaviour
self-renewal efficiency increases in the presence of LIF and is rescued in its absence 
shows no or labile mitotic binding at other regions

might thereby obfuscate proportionally small but potentially critical inaccuracies in the estimation of expression differences across samples
it is critical to use a plurality of complementary benchmarks
hastening discovery
PHS-funded research runs the gamut from basic research to applied research and product development such as a diagnostic test or drug

make these cultured cells more authentic imitations of their in vivo counterparts
ostentatious wealth
the building has been maligned by some city residents
wipe your mouths; that drooling is unsightly
relish the thought of spending a great deal of time on the effort

gleaming, modern biotechnology palace
one of the most acclaimed young scientist
emerged victorious after
soft-spoken
right at this second

drawn to research opportunities 
elite institutions soon recognized his brilliance
lurking out there somewhere is the Nobel
winnowing a field of 96 dish racks across four categories
suitably enlightened compilers

other ailments
activate a battery of  antioxidant response genes
an exorbitant amount of training data
underscoring the proclivity for TP53 mutations to arise that have the potential to cause cancer
More advanced tinkering may help solve this problem

For a neophyte,...
Without entering the meanders of the program
Both are juxtaposed within TADs
culminating in their aggregation to form chromosome territories
assuage privacy concerns

something figured heavily in someone' work
users inadvertently leave their ID card on their desk
The chromosome territories abut at their borders to create a continuous body of chromatin
eliminate extraneous typing
sth would all be pared back

sb strays from something
the enrichment scoring bias engendered by compositional extremes of some ChIP-Seq datasets
this does not bode as well for
scientists are a resourceful bunch
nor can the solutions be forcibly wed

If a user is able to escalate to root
cause denial of service to the host
a multitude of vectors
the enhancer-gene-specificity conundrum
molecular mechanisms engender specific enhancer-gene interactions within TADs

The incumbent will be responsible for
aesthetically pleasing
clutter listing
portray the outcome of this assignment in
this effect can be achieved with some sleight of hand

the R interpreter chugs along until it gets to the last line of the function
a jobbing programmer just trying to Get Stuff Done
With great fanfare, Sichuan Agricultural University held a ceremony
no one will retire in luxury from this
continue their promising research

The custom of rewarding researchers monetarily for single publications is deeply entrenched at Chinese scientific institutions.
For many, it is an official policy, written in the bylaws.
depend on locally distributed monetary awards as a lifebuoy
the road from bench to publication is long and winding
Information is everywhere, and out there on the world wide web, a lot of it is flippant, irresponsible, and one-sided, to say the least.

balancing the deluge of digital advances, trends, and mandates with the standards and respected traditions of academia is...
Bells and whistles are great for the masses
you need to tread carefully
enhance our primary purpose
it just so happens to fall within an exciting coalescence of events

The focus of this book is unabashedly on hypothesis generation
that’s a false dichotomy
answering our dumb questions
you will home in on a few particularly productive areas
even if the questions are handed to you on a platter

we need to take a little detour to
It’s not foolproof
readr uses a heuristic to figure out
used the pejorative term “messy” to refer to non-tidy data
a block of code elided here

A fib about owners and pointers
metaprogramming is used sparingly
She had an amazing memory and could recall verbatim quite complex conversations.
by-and-large, not that important if you’re only developing packages for yourself
Some stochastic changes will be inconsequential “passengers”; others will confer fitness and be selected as “drivers.”

a vast repertoire of gene expression patterns
Overly restrictive chromatin accentuates epigenetic barriers that prevent cell state transitions
Overly permissive chromatin lowers epigenetic barriers, allowing promiscuous sampling of alternate cell states.
The opposing activities of the H3K27 methyltransferase EZH2 and the H3K4 methyltransferase MLL
many tumors exhibit deranged developmental programs

The overriding premise is that
specific genetic, environmental, and metabolic stimuli disrupt the homeostatic balance of chromatin
may in fact suffice to satisfy every hallmark of cancer
ace a project
scroll through Facebook

how the core complex interacts with auxiliary complex components
thorny questions
She resigned when she was relegated to a desk job
distil some of the available information
the highly acclaimed Rcpp package

but, to be thoroughly pedantic,
Only true R wizards should even consider using this function
discern the programming strategies, idioms, and style of R programmers
I don’t understand all of the R idioms that are common in the advanced R programmers’ conversation, but I am getting some traction.
there is usually a harsh awakening

a burgeoning collection of blogs and Web forums
If there are any stanzas that look mostly the same
the usual "rambling" R script
She spends a lot of time poring over the historical records of the church.
While we’re heaping praise

A digression
A seasoned software developer
a script aficionado
Groovy, obviously, is too flexible to be pigeonholed
All the power of the Java platform is there to be harnessed.

Groovy performs a lot of work behind the scenes to achieve its agility and dynamic nature.
Groovy obediently accepted and executed it
Java is adamant that
Some people may look disparagingly on the text based on the exclusions
Scientific workflows have become the lingua franca of scientific research for orchestrating the execution of processes and tasks.

Universal IRF4 ablation potentiates neointima formation in both mice and rats.
exhibited the opposite phenotype
an effect was abrogated following
the cells continue to divide, piling up into mounds
Charles Babbage's mechanical calculating engines were the antecedents of the modern computer

Most people doing data analysis do this or variations thereof.
an ostensible ‘charity’ position
Heterochromatin was originally discerned cytologically by the intensity of dark staining with DNA dyes
A large fraction of mammalian genomes is taken up by repeat-rich sequences – including tandem-repeat satellites near centromeres and telomeres, retrotransposons, and endogenous  retroviruses - which pose a risk to genome integrity through their potential for illicit recombination and self-duplication.

Seminal work in the 1950s establishe
at long latency (weeks to months)
the primacy of O, S, and K as pioneer factors at enhancers of genes that promote reprogramming
DNA methylation transgressed in pathogenesis
cumulative evidence expounds on the similarities of their function

which in turn necessitates revisiting the existing understanding on DNA methylation maintenance and homeostasis
In a stochastic model, twin forces of methylases and demethylases contend in an equilibrium
resolve the question as to whether
Methyl-CpG-binding domain protein 3 (MBD3) belongs to a family of nuclear proteins in close relation to DNA methylation, but exhibits elusive epigenomic association and functional identity

This corepressor (MBD3) thwarts reprogramming factor activity at sites they already bind.
If we see Mr. Klein's car pull in, we'll give you a heads-up
take the world of biomedical science by storm
through either disruptive or precise genome edits
writing or removing epigenetic marks on DNA and histones

Base editors have increased the efficiency of CRISPR-targeted base substitutions.
facile genome engineering
Watson-Crick base-pairing
singular advances

the field was primed for
viruses are constantly evolving to evade CRISPRmediated attenuation
long diatribes of the million and one reasons why
unload on the person about how stressed-out and frustrated they made you
change your coworker into a paragon of productivity

you can be released from the negative emotional charge around it
if you dread discord
ruminating over what to say
the mental gymnastics of endlessly practicing conversations in your head
the subordinate who keeps underperforming

When the opportunity presents itself to provide unsolicited negative feedback
a less-than-positive performance evaluation
presiding over an unenthusiastic performance review
To keep tensions from blazing
slow your cadence

To steel oneself for the conversation, someone called on someone's years of experience...
Before broaching the subject
a mood-veering, thought-steering, pressure-packed interaction
this analysis paralysis occurs when your brain suddenly becomes overtaxed by worry or pressure
In a contentious moment

It goes unacknowledged or is tersely rejected.
Spoken with composure
gives you the upper hand
a couple other topics worth considering in tandem with this
eliminates oversharing

surprises him with an addition to his wardrobe
perpetually exhausted
sobering, and galvanizing
I usually gorged myself
The value of such breaks is grounded in our physiology

our remaining capacity burns down as the day wears on
While breaks are countercultural in most organizations and counterintuitive for many high achievers, their value is multifaceted
a mental and emotional breather
he found his work increasingly exhausting and dispiriting

is it being in charge that’s most invigorating or participating in a creative endeavor
he instituted a ritual in which
people unintentionally divulge what they stand for
took it upon herself to foster the excitement and commitment of her leadership team
Employees feel increasingly beleaguered

bring all energy wholeheartedly to work
Yet Pozen never comes across as overwhelmed, frazzled, or even all that busy.
completely botched up results
enhancer competence

pluripotency-supporting cytokine leukemia inhibitory factor (LIF)
overt differences
commencement of differentiation 
cessation of transgenic OKSM expression
In this day and age

dwindling ability
employees are jaded
this contemporary malaise 
Her peers weren’t quite as enthralled
unfettered ambition

tell a falsehood
stifle our dreams
a candidate’s scuffed shoes
commitment wanes
Overwork is seductive, because it is still lauded 


assuage our guilt
he'd be contrite
give someone an ultimatum
work may equal drudgery 
menial jobs

something is a grind
abiding enjoyment of daily activities
feel disdained
love founded on caring, concern, and camaraderie
Sheep's milk cheese is the quintessential Corsican cheese

it is prudent to...
It purports to be an exposition of ...
results presented herein
In order to ensure this capability
bleeding-edge releases

When you exchange pleasantries with a co-worker in the elevator
the threatening person wielding a gun
He still wields enormous influence in politics

communication norms included a rough-and-tumble banter
to express dissatisfaction with a colleague’s subpar work 

chastise someone
politics being stigmatized
high potential employees are your proverbial superstars
you don't want to create prima donnas
what you espouse as standards of performance and what actually do

prod someone
blatantly bad behavior
successful careers are predicated on great relationships
Social etiquette dictates that men cannot sit while women are standing
long bouts of locking eyes

you might look for eye aversions
odd fidgeting
innovation often gets a lot of lip service 
unfortunately, lots of people have a myopic view of innovation
your job is to receive that coveted promotion, because you truly earned it.

tooting your own horn
your job is to groom and develop and your employee
budgets are strained (difficult)
I don't want to sound too grandiose
forthright conversation

shuns difficult conversations
Certain people always want to maintain the status quo.
you have to be nimble
someone will have feather ruffled
increase camaraderie

cement the relationship
you went to the same alma mater
a rich vein of conversation
Far too many people neglect this because somehow we've come to associate competence with sternness.
look goofy

I am a big proponent of something
He was still standing on the mountain of evidence he'd accrued.
This scenario is quite tame in comparison to some I've witnessed.
There's usually more animosity
Livid

Nuanced
Mannerism
Torment someone
Mal-intent
My righteousness is shot down by the arrow of hypocrisy.

a brash young engineer
reprimand yourself for not keeping your emotions in check
crying in the workplace is especially fraught for women
narcissist
indigent

that vision came to fruition

Statistics Programming in C

C Programming Language Tips

Gibbs sampler in various languages

Excerpts from "Head First C"

=====
scanf("%79[^\n]", line") 
It means read up to 79 characters, so long as they're not NEWLINES.
=====
For free text input use fgets().

char line[80];
fgets(line, 80, stdin);
printf("Your quote was: %s", line);
===== 
The stack is the section of memory used for local variable storage. Every time you call a function, all of the function's local variables are created on the stack. Variables are added to the stack when you enter a function, and get taken off the stack when you leave.
===== 
The heap is for dynamic memory - pieces of data that get created when the
progeam is running and then hang around a long time.
===== 
The code segment is the part of the memory where the actual assembled code gets loaded.
=====
printf("I like Turtles!"); is the simplified version of fprintf(stdout, "I like Turtles!");
=====
">" redirects the Standard Output. But "2>" redirects the Standard Error.
=====
i % 2 means "The remainder left when you divide by 2".
=====
FILE* in_file = fopen("input.txt", "r");
FILE* out_file = fopen("output.txt", "w");
fprintf(out_file, "Don't wear %s with %s", "red", "green");
fscanf(in_file, "%79[^\n]\n", sentence);
fclose(in_file);
fclose(out_file);
=====
int main(int argc, char* args[])
{
.... Do stuff....
}
The first argument contains the name of the program as it was run by the user.
=====
file accessibility sanity check:

FILE* in;
if (!(in = fopen("dont_exist.txt", "r"))) {
    fprintf(stderr, "Can't open the file.\n");
    return 1;
}
=====
After processing the arguments, the 0th argument will no longer be the program name.
=====
To avoid ambiguity, you can split your main arguments from the options using "--". So you would write "set_temperature -c -- -4". getopts() will stop reading options when it sees the "--", so the rest of the line will be read as simple arguments.
=====
When we tell the compiler about a function it's called a function declaration. C allows you to take that whole set of declarations out of your code and put them in a header file.
=====
When the compiler sees an include line with angle brackets it assumes it will find the header file somewhere off in the directories where the library code lives. But our header file is in the same directory as our .c file. By wrapping the header file name in quotes we are telling the compiler to look for a local file.
=====
What if you want to share variables? Source code files normally contain their own separate variables to prevent a variable in one file affecting a variable in another file with the same name. But if you genuinely want to share variables, you should declare them in your header file and prefix them with the keyword extern.
=====
Every file that make compiles is called a target.
For every target, make needs to be told two things:
  1. The dependencies: which files the target is going to be generated from.
  2. The recipe: the set of instructions it needs to run to generate the file.
Together the dependencies and the recipe form a rule.
=====
All the recipe lines MUST begin with a TAB character. If you just try to indent the recipe lines with spaces, the build won't work.
=====
Make takes away a lot of the pain of compiling files. But if you find that even make is not automatic enough, take a look at a tool called autoconf:
http://www.gnu.org/software/autoconf/. Autoconf is used to generate makefiles.
=====
structtypedef struct {
     int cell_no;
     const char * wallpaper;
     float minutes_of_charge;
} phone;
phone p = {5557879, "s.png", 1.35};








 

Saturday 2 August 2014

Excerpts from the book of "The C programming language"

"A character written between single quotes represents an integer value equal to the numerical value of the character in the machine’s character set. This is called a character constant, although it is just another way to write a small integer."

"Parameter names need not agree. Indeed, parameter names are optional in a function prototype, so for the prototype we could have written int power(int, int);" 

"""One aspect of C functions may be unfamiliar to programmers who are used to some other languages, particulary Fortran. In C, all function arguments are passed ‘‘by value.’’ This means that the called function is given the values of its arguments in temporary variables rather than the originals. This leads to some different properties than are seen with ‘‘call by reference’’ languages like Fortran or with var parameters in Pascal, in which the called routine has access to the original argument, not a local copy."""

"The story is different for arrays. When the name of an array is used as an argument, the value passed to the function is the location or address of the beginning of the array - there is no copying of array elements. By subscripting this value, the function can access and alter any argument of the array."




















Saturday 19 July 2014

Graduation Using Summation Formulae: Spencer 15-point rule

> spence.15
function (y)
{
    n <- length(y)
    y <- c(rep(y[1], 7), y, rep(y[n], 7))
    n <- length(y)
    k <- 3:(n - 2)
    a3 <- y[k - 1] + y[k] + y[k + 1]
    a2 <- y[k - 2] + y[k + 2]
    y1 <- y[k] + 3 * (a3 - a2)
    n <- length(y1)
    k <- 1:(n - 3)
    y2 <- y1[k] + y1[k + 1] + y1[k + 2] + y1[k + 3]
    n <- length(y2)
    k <- 1:(n - 3)
    y3 <- y2[k] + y2[k + 1] + y2[k + 2] + y2[k + 3]
    n <- length(y3)
    k <- 1:(n - 4)
    y4 <- y3[k] + y3[k + 1] + y3[k + 2] + y3[k + 3] + y3[k +
        4]
    y4/320
}
<environment: namespace:locfit>

Thursday 17 July 2014

Binary Search Array vs. Binary Search Tree

What is difference between Array and Binary search tree in efficiency?

Loop Invariant

Loops and Invariants

Theta, Oh and Omega Notations

n: the size of input.
Theta of n: indicate that a running time is bounded from above by some linear function of n, and from below by some (possibly different) linear function of n.
Big-oh of n: indicates that a running time is never worse than a constant times some function of n.
Big-omega of n: indicates that a running time is never better than a constant times some function of n.


Monday 7 July 2014

Perl: Import and Export

@EXPORT_OK
This array contains symbols that can be imported if they are specifically asked for.

In the module, for example,
@EXPORT_OK = qw (Op_Func %Table);

The user could load the module like so
use YourModule qw(Op_Func %Table F1);
# The F1 function was listed in the @EXPORT array. Notice that this does not automatically import F2 or @List, even though they're in the @EXPORT array. To get everything in @EXPORT plus extras from @EXPORT_OK, use the special :DEFAULT tag, such as:
use YourModule qw(:DEFAULT %Table);

%EXPORT_ TAGS
This hash is used by large modules like CGI or POSIX to create higher-level groupings of related import symbols. Its values are references to arrays of symbol names, all of which must be in either @EXPORT or @EXPORT_OK. Here's a sample initalization:
%EXPORT_TAGS=(
    Functions=>[ qw (F1 F2 Op_Func) ],
    Variables=>[ qw (@List %Table) ]
);

An import symbol with a leading colon means to import a whole group of symbols. Here's an example:
use YourModule qw(:Functions %Table);

That pulls in all symbols from:
@{ $YourModule::EXPORT_TAGS{Functions} } and then the %Table hash.




Wednesday 2 July 2014

R Package Namespaces

Cited from "Software for Data Analysis: Programming with R"

To apply the namespace mechanism, you must write a sequence of namespace directives in a file called "NAMESPACE" that resides in the top-level directory of your packages source. The directives look roughly like R expressions, but they are not evaluated by the R evaluator. Instead, the file is processed specially to defin the objects that our packages sees and the objects in our package that are seen by other software.

The namespace directives define two R environments, one for the objects that perform the computations inside the package and the other for the objects that users see when the package is attached in an R session. The first of these is referred to as the package's namespace. The second, the result of the export directives in the NAMSPACE file, is the environment attached in the search list.

When you access the two environments explicitly, they will print symbolically in a special form. For package SoDA, the environments would be <environment: namespace: SoDA> and <environment: package: SoDA>, respectivley.

The package's namespace contains all the objects generated by installing the package, that is, all the objects created by evaluating the R source in the package's R subdirectory.
  • The parent of the namespace is an environment containing all the objects defined by the import command in the NAMESPACE file.
  • The parent of that environment is the namespace of R's base package. 
Using a NAMESPACE file, computations in the package will see the explicitly imported objects and the base package, in that order, regardless of what the packages are attached in the session.

Environment Variable in R

Cited from Environments

Every environment has a parent, another environment. Only one environment doesn’t have a parent: the empty environment.

It’s rare to talk about the children of an environment because there are no back links: given an environment we have no way to find its children.

Generally, an environment is similar to a list, with four important exceptions:
  1. Every object in an environment has a unique name.
  2. The objects in an environment are not ordered (i.e. it doesn’t make sense to ask what the first object in an environment is).
  3. An environment has a parent.
  4. Environments have reference semantics.
More technically, an environment is made up of two components, the frame, which contains the name-object bindings (and behaves much like a named list), and the parent environment. Unfortunately “frame” is used inconsistently in R. For example, parent.frame() doesn’t give you the parent frame of an environment, it gives you the calling environment.

There are four special environments:
  1. The globalenv(), or global environment, is the interactive workspace. This is the environment in which you normally work. The parent of the global environment is the last package that you attached with library() or require().
  2. The baseenv(), or base environment is the environment of the base package. Its parent is the empty environment.
  3. The emptyenv(), or empty environment, is the ultimate ancestor of all environments, and the only environment without a parent.
  4. The environment() is the current environment.

search() lists all parents of the global environment. This is called the search path because objects in these environments can be found from the top-level interactive workspace. It contains one environment for each attached package and any other objects that you’ve attach()ed. It also contains a special environment called Autoloads which is used to save memory by only loading package objects (like big datasets) when needed.

You can access any environment on the search list using as.environment().
For example, as.environment("package:stats").

To create an environment manually, use new.env(). You can list the bindings in the environment’s frame with ls() and see its parent with parent.env().

Another useful way to view an environment is ls.str(). It is more useful than str() because it shows each object in the environment. Like ls(), it also has an all.names argument.

Given a name, you can extract the value to which it is bound with $, [[, or get():
  1. $ and [[ look only in one environment and return NULL if there is no binding associated with the name.
  2. get() uses the regular scoping rules and throws an error if the binding is not found.
To compare enviroments, you must use identical() not ==.

Given a name, where() finds the environment where that name is defined, using R’s regular scoping rules.

The definition of where() is straightforward. It has two arguments: the name to look for (as a string), and the environment in which to start the search.

where <- function(name, env = parent.frame())
{
    if (identical(env, emptyenv()))
    {
        # Base case
        stop("Can't find ", name, call. = FALSE)
    } else if (exists(name, envir = env, inherits = FALSE)) {
        # Success case
        env
    } else {
        # Recursive case
        where(name, parent.env(env))
    }
}

The four types of environments associated with a function are enclosing, binding, execution, and calling.

The enclosing environment is the environment where the function was created. Every function has one and only one enclosing environment. For the three other types of environment, there may be 0, 1 or many environments associated with each function:
  1. Binding a function to a name with <- defines a binding environment.
  2. Calling a function creates an ephemeral execution environment that stores variables created during execution.
  3. Every execution environment is associated with a calling environment, which tells you where the function was called.
The enclosing environment
When a function is created, it gains a reference to the environment where it was made. This is the enclosing environment and is used for lexical scoping. You can determine the enclosing environment of a function by calling environment() with a function as its first argument.