Bottlenose dolphin population genomic analysis
Introduction
Used for population genetics studies.
Stages
Quality trimming
Using trimmomatic
MITOREFLOC=/storage/home/users/ml228/Genomics/NEA_BGI_data_ALL/NEA_BGI_clean_raw_data/Refs/mitogenome/gi_557468684_gb_KF570351.1_.fasta
- FQN1, the forward paired output from Trimmomatic
- FQN2, the reverse paired output from Trimmomatic
- UBAM, the unmapped bam reflecting exclusion of mitochondiral mappings
MTBAM=${SSD1}/${MDN}/${SSD1}_${SRN1}_${SLN1}_to_mt_sorted.bam # the alignment to mito, sorted and now in bam format. MTSAM=${SSD1}/${MDN}/mito_aln-pe${SSD1}_${SRN1}_${SLN1}.sam # the raw sam alignment to mito, unsorted TFQ1=${SSD1}/${SDN}/${SSD1}_${SRN1}_${SLN1}_forward_paired.fq.gz # trimmed fq data file, forward paired reads TFQ2=${SSD2}/${SDN}/${SSD2}_${SRN2}_${SLN2}_reverse_paired.fq.gz # trimmed fq data file, reverse paired reads ACPTSZ=1000000 # acceptable size of a bam, a somewhat (only) risky way of seeing that it's OK.
REABAM=${SSD1}/${MOD}/aln-pe${SSD1}_${SRN1}_${SLN1}_sorted.bam REASAM=${SSD1}/${MOD}/aln-pe${SSD1}_${SRN1}_${SLN1}.sam
First we align to the mitochondrial reference:
bwa mem -t $NSLOTS $MITOREFLOC $TFQ1 $TFQ2 > $MTSAM
The resulting sam is sorted and converted to bam format:
samtools view -bSh $MTSAM | samtools sort - ${MTBAM%.*} -@ $NSLOTS
Actually we're only interested in the reads that did not map to the mitochondrial, and we can extract them into the their own bam with:
samtools view -f4 -bh $MTBAM > $UBAM -@ $NSLOTS
Explanation:
- -f4 specifies that only reads with a SAM FLAG setting of 1 on the third bit will be output, This is the setting that indicates the map did not not align. Essentially, this is the key part of this whole stage, it enables us obtain the reads which did not map to the mitichondrial reference, and also avoid those that did.
We can convert this extracted alignment back into FASTQ files with picard-tools and then compress as follows:
java -Xmx16g -jar $PICARDJARFILE SamToFastq I=${UBAM} F=${FQN1%.*} F2=${FQN2%.*} pigz -f -p $NSLOTS ${FQN1%.*} pigz -f -p $NSLOTS ${FQN2%.*}
These new reads can be aligned to the true reference
bwa mem -t $NSLOTS $REFLOC $FQN1 $FQN2 > $REASAM
The raw sam can be sorted and converted to bam to produce the alignment with the reads mapping to teh mitochondiral reference excluded as follows:
samtools view -bSh ${REASAM} | samtools sort - ${REABAM%.*} -@ $NSLOTS