Bottlenose dolphin population genomic analysis
Introduction
Used for population genetics studies.
Stages
Quality trimming
Using trimmomatic
MITOREFLOC=/storage/home/users/ml228/Genomics/NEA_BGI_data_ALL/NEA_BGI_clean_raw_data/Refs/mitogenome/gi_557468684_gb_KF570351.1_.fasta
- FQN1, the forward paired output from Trimmomatic
- FQN2, the reverse paired output from Trimmomatic
- UBAM, the unmapped bam reflecting exclusion of mitochondiral mappings
MTBAM=${SSD1}/${MDN}/${SSD1}_${SRN1}_${SLN1}_to_mt_sorted.bam # the alignment to mito, sorted and now in bam format. MTSAM=${SSD1}/${MDN}/mito_aln-pe${SSD1}_${SRN1}_${SLN1}.sam # the raw sam alignment to mito, unsorted TFQ1=${SSD1}/${SDN}/${SSD1}_${SRN1}_${SLN1}_forward_paired.fq.gz # trimmed fq data file, forward paired reads TFQ2=${SSD2}/${SDN}/${SSD2}_${SRN2}_${SLN2}_reverse_paired.fq.gz # trimmed fq data file, reverse paired reads ACPTSZ=1000000 # acceptable size of a bam, a somewhat (only) risky way of seeing that it's OK.
REABAM=${SSD1}/${MOD}/aln-pe${SSD1}_${SRN1}_${SLN1}_sorted.bam REASAM=${SSD1}/${MOD}/aln-pe${SSD1}_${SRN1}_${SLN1}.sam
First we align to the mitochondrial reference:
bwa mem -t $NSLOTS $MITOREFLOC $TFQ1 $TFQ2 > $MTSAM
The resulting sam is sorted and converted to bam format:
samtools view -bSh $MTSAM | samtools sort - ${MTBAM%.*} -@ $NSLOTS
Actually we're only interested in the reads that did not map to the mitochondrial, and we can extract them into the their own bam with:
samtools view -f4 -bh $MTBAM > $UBAM -@ $NSLOTS
We can convert this extracted alignment back into FASTQ files with picard-tools and then compress as follows:
java -Xmx16g -jar $PICARDJARFILE SamToFastq I=${UBAM} F=${FQN1%.*} F2=${FQN2%.*} pigz -f -p $NSLOTS ${FQN1%.*} pigz -f -p $NSLOTS ${FQN2%.*}
These new reads can be aligned to the true reference
bwa mem -t $NSLOTS $REFLOC $FQN1 $FQN2 > $REASAM
The raw sam can be sorted and converted to bam to produce the alignment with the reads mapping to teh mitochondiral reference excluded as follows:
samtools view -bSh ${REASAM} | samtools sort - ${REABAM%.*} -@ $NSLOTS