Bwa
Introduction
Heng Li's aligner.
Usage
As with samtools, bwa also went through some re-structuring, so that it has an old-style tw-step (aln and sam{s,p}e) usage, characterised by the following typical sequence of commands:
bwa index reference.fa bwa aln -I -t 8 reference.fa s_1.txt > out.sai bwa samse reference.fa out.sai s_1.txt > out.sam samtools view -bSu out.sam | samtools sort - out.sorted
And then a more modern usage which consists of just one step: bwa mem.
Indexing
Before alignment, indexing the reference is necessary. When bwa indexes a reference, it will use the whole filename and generate output index files with extensions added onto this name.
bwa index input_reference.fasta index_prefix
This index_prefix is then used for the actual alignment step.
Indexing may take a long time with large reference files. Here is some example output
[bwa_index] Pack FASTA... 150.69 sec [bwa_index] Construct BWT for the packed sequence... [BWTIncCreate] textLength=4745806978, availableWord=345932252 [BWTIncConstructFromPacked] 10 iterations done. 99999986 characters processed. [BWTIncConstructFromPacked] 20 iterations done. 199999986 characters processed. . . . [BWTIncConstructFromPacked] 530 iterations done. 4719199682 characters processed. [BWTIncConstructFromPacked] 540 iterations done. 4741303618 characters processed. [bwt_gen] Finished constructing BWT in 543 iterations. [bwa_index] 2470.36 seconds elapse. [bwa_index] Update BWT... 129.37 sec [bwa_index] Pack forward-only FASTA... 302.45 sec [bwa_index] Construct SA from BWT and Occ... 1220.18 sec [main] Version: 0.7.12-r1039 [main] CMD: bwa index unplaced.scaf.fa [main] Real time: 5398.987 sec; CPU: 4273.064 sec
The output files will be, in this case:
- unplaced.scaf.fa.amb, a text file
- unplaced.scaf.fa.ann, a text file
- unplaced.scaf.fa.bwt, a binary file
- unplaced.scaf.fa.pac, a binary file
- unplaced.scaf.fa.sa, a binary file
Actual alignment step
bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq
In this case we have paired read-files. With single reads of course only one name would be required.