Thursday 12 June 2014

BWA and GATK: read groups

"My understanding is that a read group means, roughly, "a set of reads that were all the product of a single sequencing run on one lane". If you have multiplexed samples in a single lane, you will get multiple samples in a single read group. If you sequenced the same sample in several lanes, you will have multiple read groups for the same sample."

Cited from https://www.biostars.org/p/43897/ 

The meaning of the standard read group fields can be found on http://gatkforums.broadinstitute.org/discussion/1317/collected-faqs-about-bam-files

Compose the read group identifier in the following format:
@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1\tPU:unit1
where the \t stands for the tab character.

Cited from https://www.broadinstitute.org/gatk/guide/tagged?tag=bwa

Example of Read Group usage
Support we have a trio of samples: MOM, DAD, and KID. Each has two DNA libraries prepared, one with 400 bp inserts and another with 200 bp inserts. Each of these libraries is run on two lanes of an illumina hiseq, requiring 3 x 2 x 2 = 12 lanes of data. When the data come off the sequencer, we would create 12 BAM files, with the following @RG fields in the header:
Dad's data:
@RG ID:FLOWCELL1.LANE1 PL:illumina LB:LIB-DAD-1 SM:DAD PI:200
@RG ID:FLOWCELL1.LANE2 PL:illumina LB:LIB-DAD-1 SM:DAD PI:200
@RG ID:FLOWCELL1.LANE3 PL:illumina LB:LIB-DAD-2 SM:DAD PI:400
@RG ID:FLOWCELL1.LANE4 PL:illumina LB:LIB-DAD-2 SM:DAD PI:400
Mom's data:
@RG ID:FLOWCELL1.LANE5 PL:illumina LB:LIB-MOM-1 SM:MOM PI:200
@RG ID:FLOWCELL1.LANE6 PL:illumina LB:LIB-MOM-1 SM:MOM PI:200
@RG ID:FLOWCELL1.LANE7 PL:illumina LB:LIB-MOM-2 SM:MOM PI:400
@RG ID:FLOWCELL1.LANE8 PL:illumina LB:LIB-MOM-2 SM:MOM PI:400
Kid's data:
@RG ID:FLOWCELL2.LANE1 PL:illumina LB:LIB-KID-1 SM:KID PI:200
@RG ID:FLOWCELL2.LANE2 PL:illumina LB:LIB-KID-1 SM:KID PI:200
@RG ID:FLOWCELL2.LANE3 PL:illumina LB:LIB-KID-2 SM:KID PI:400
@RG ID:FLOWCELL2.LANE4 PL:illumina LB:LIB-KID-2 SM:KID PI:400

Cited from http://toolshed.g2.bx.psu.edu/repository/display_tool?repository_id=c45d6c51a4fcfc6c&tool_config=database%2Fcommunity_files%2F000%2Frepo_259%2Fpicard_AddOrReplaceReadGroups.xml&changeset_revision=bf1c3f9f8282

Compose the read group identifier in the following format:
@RG\tID:group1\tSM:sample1\tPL:illumina\tLB:lib1\tPU:unit1
where the \t stands for the tab character.
Cited from https://www.broadinstitute.org/gatk/guide/tagged?tag=bwa


No comments:

Post a Comment