Thursday, 12 June 2014

Naming conventions of Illumina sequence reads and fastq files

@HWUSI-EAS100R:6:73:941:1973#0/1

HWUSI-EAS100R the unique instrument name
6 flowcell lane
73 tile number within the flowcell lane
941 'x'-coordinate of the cluster within the tile
1973 'y'-coordinate of the cluster within the tile
#0 index number for a multiplexed sample (0 for no indexing)
/1 the member of a pair, /1 or /2 (paired-end or mate-pair reads only)

Cited from http://en.wikipedia.org/wiki/FASTQ_format#Illumina_sequence_identifiers

The naming conventions of fastq files are inconsistent across different sources.

According to http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/swSEQ_mCA_FASTQFiles.htm

Illumina FASTQ files use the following naming scheme:

<sample name>_<barcode sequence>_L<lane (0-padded to 3 digits)>_R<read number>_<set number (0-padded to 3 digits>.fastq.gz

For example, the following is a valid FASTQ file name:

NA10831_ATCACG_L002_R1_001.fastq.gz

However, according to http://physiology.med.cornell.edu/faculty/mason/lab/r-make/usage.html

The Illumina fastq files are also encoded as
 <sample name>_<flowcell id>_<barcode sequence>_L<lane (0-padded to 3 digits)>_R<read number >_<set number (0-padded to 3 digits>.fastq.gz

No comments:

Post a Comment