Wednesday, 25 February 2015

Multiple Testing

Per comparison error rate (PCER)
Per-family error rate (PFER)
Family-wise error rate (FWER)
False discovery rate (FDR)
Positive false discovery rate (pFDR)  

Multiple Testing

Tuesday, 24 February 2015

Gene Set Enrichment Analysis

GSEA tutorial: Analysis of Microarray Data

Command line options for running GSEA: Syntax

Excerpts from GSEAPreranked Page
  1. when using the GSEAPreranked tool, we recommend you provide a ranked list that already has unique human gene symbols and select false for the parameter Collapse data set to gene symbols
  2. In standard GSEA you can choose to set  the parameter Permutation type to phenotype (the default) or gene set, but this option is not available in GSEAPreranked.
  3. In the case of GSEAPreranked, you should make sure that this weighted scoring scheme applies to your choice of ranking statistic. When in doubt, we recommend using a more conservative scoring approach by setting Enrichment statistic to classic.
  4. select Tools>GseaPreranked.
  5. Gene sets database.
  6. Number of permutations. Specify the number of gene_set permutations to perform in assessing the statistical significance of the enrichment score. It is best to start with a small number, such as 10. After the analysis completes successfully, run it again with a full set of permutations. The GSEA recommends 1000 gene_set permutations.

Monday, 16 February 2015

Batch Effect Visualisation


"Significant batch effects can be seen by the perfect separation of different batches on the PCA score plots for most data sets. Other visualization techniques can also be used to evaluate batch effects such as hierarchical clustering dendrogram, correlation heat-map and variance components pie chart from analysis of variance. The latter is a quantitative technique that gives the variances contributed by all factors when the class labels of all the samples are available. This allows the comparison of variances contributed by batch effects, biological effects and other effects. However, for cross-batch prediction in real applications, the class labels of the samples in the test set (future batch) are to be predicted and are unavailable, and thus analysis of variance cannot be applied for the endpoint factor. This approach is useful for evaluating the sources of variation and process control of sample handling and processing when all of these factors are recorded and reported."

Machine Learning Tutorial

Machine Learning Mastery