Thursday 12 June 2014

Bedtools: genome file

What is a “genome” file?

Some of the BEDTools (e.g., genomeCoverageBed, complementBed, slopBed) need to know the size of the chromosomes for the organism for which your BED files are based. When using the UCSC Genome Browser, Ensemble, or Galaxy, you typically indicate which species / genome build you are working.

The way you do this for BEDTools is to create a “genome” file, which simply lists the names of the
chromosomes (or scaffolds, etc.) and their size (in basepairs).

Genome files must be tab-delimitedand are structured as follows (this is an example for C. elegans):
chrI 15072421
chrII 15279323
...
chrX 17718854
chrM 13794

BEDTools includes predefined genome files for human and mouse in the /genomesdirectory included
in the BEDTools distribution. Additionally, the “chromInfo” files/tables available from the UCSC
Genome Browser website are acceptable. For example, one can download the hg19 chromInfo file here: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz

Cited from http://bedtools.googlecode.com/files/BEDTools-User-Manual.v3.pdf

No comments:

Post a Comment