Difference between revisions of "Bwa"
Line 20: | Line 20: | ||
bwa index input_reference.fasta index_prefix | bwa index input_reference.fasta index_prefix | ||
− | + | This index_prefix is then used for the actual alignment step. | |
+ | |||
+ | Indexing may take a long time with large reference files. Here is some example output | ||
+ | |||
+ | [bwa_index] Pack FASTA... 150.69 sec | ||
+ | [bwa_index] Construct BWT for the packed sequence... | ||
+ | [BWTIncCreate] textLength=4745806978, availableWord=345932252 | ||
+ | [BWTIncConstructFromPacked] 10 iterations done. 99999986 characters processed. | ||
+ | [BWTIncConstructFromPacked] 20 iterations done. 199999986 characters processed. | ||
+ | . | ||
+ | . | ||
+ | . | ||
+ | [BWTIncConstructFromPacked] 530 iterations done. 4719199682 characters processed. | ||
+ | [BWTIncConstructFromPacked] 540 iterations done. 4741303618 characters processed. | ||
+ | [bwt_gen] Finished constructing BWT in 543 iterations. | ||
+ | [bwa_index] 2470.36 seconds elapse. | ||
+ | [bwa_index] Update BWT... 129.37 sec | ||
+ | [bwa_index] Pack forward-only FASTA... 302.45 sec | ||
+ | [bwa_index] Construct SA from BWT and Occ... 1220.18 sec | ||
+ | [main] Version: 0.7.12-r1039 | ||
+ | [main] CMD: bwa index unplaced.scaf.fa | ||
+ | [main] Real time: 5398.987 sec; CPU: 4273.064 sec | ||
+ | |||
+ | The output files will be, in this case: | ||
+ | |||
+ | * unplaced.scaf.fa.amb, a text file | ||
+ | * unplaced.scaf.fa.ann, a text file | ||
+ | * unplaced.scaf.fa.bwt, a binary file | ||
+ | * unplaced.scaf.fa.pac, a binary file | ||
+ | * unplaced.scaf.fa.sa, a binary file | ||
+ | |||
+ | == Actual alignment step == | ||
bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq | bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq | ||
In this case we have paired read-files. With single reads of course only one name would be required. | In this case we have paired read-files. With single reads of course only one name would be required. |
Revision as of 16:21, 30 November 2016
Introduction
Heng Li's aligner.
Usage
As with samtools, bwa also went through some re-structuring, so that it has an old-style tw-step (aln and sam{s,p}e) usage, characterised by the following typical sequence of commands:
bwa index reference.fa bwa aln -I -t 8 reference.fa s_1.txt > out.sai bwa samse reference.fa out.sai s_1.txt > out.sam samtools view -bSu out.sam | samtools sort - out.sorted
And then a more modern usage which consists of just one step: bwa mem.
Indexing
Before alignment, indexing the reference is necessary. When bwa indexes a reference, it will use the whole filename and generate output index files with extensions added onto this name.
bwa index input_reference.fasta index_prefix
This index_prefix is then used for the actual alignment step.
Indexing may take a long time with large reference files. Here is some example output
[bwa_index] Pack FASTA... 150.69 sec [bwa_index] Construct BWT for the packed sequence... [BWTIncCreate] textLength=4745806978, availableWord=345932252 [BWTIncConstructFromPacked] 10 iterations done. 99999986 characters processed. [BWTIncConstructFromPacked] 20 iterations done. 199999986 characters processed. . . . [BWTIncConstructFromPacked] 530 iterations done. 4719199682 characters processed. [BWTIncConstructFromPacked] 540 iterations done. 4741303618 characters processed. [bwt_gen] Finished constructing BWT in 543 iterations. [bwa_index] 2470.36 seconds elapse. [bwa_index] Update BWT... 129.37 sec [bwa_index] Pack forward-only FASTA... 302.45 sec [bwa_index] Construct SA from BWT and Occ... 1220.18 sec [main] Version: 0.7.12-r1039 [main] CMD: bwa index unplaced.scaf.fa [main] Real time: 5398.987 sec; CPU: 4273.064 sec
The output files will be, in this case:
- unplaced.scaf.fa.amb, a text file
- unplaced.scaf.fa.ann, a text file
- unplaced.scaf.fa.bwt, a binary file
- unplaced.scaf.fa.pac, a binary file
- unplaced.scaf.fa.sa, a binary file
Actual alignment step
bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq
In this case we have paired read-files. With single reads of course only one name would be required.