Difference between revisions of "Bwa"

From wiki
Jump to: navigation, search
Line 20: Line 20:
 
  bwa index input_reference.fasta index_prefix
 
  bwa index input_reference.fasta index_prefix
  
The index_prefix is then use for the actual alignment step which is as follows:
+
This index_prefix is then used for the actual alignment step.
 +
 
 +
Indexing may take a long time with large reference files. Here is some example output
 +
 
 +
[bwa_index] Pack FASTA... 150.69 sec
 +
[bwa_index] Construct BWT for the packed sequence...
 +
[BWTIncCreate] textLength=4745806978, availableWord=345932252
 +
[BWTIncConstructFromPacked] 10 iterations done. 99999986 characters processed.
 +
[BWTIncConstructFromPacked] 20 iterations done. 199999986 characters processed.
 +
.
 +
.
 +
.
 +
[BWTIncConstructFromPacked] 530 iterations done. 4719199682 characters processed.
 +
[BWTIncConstructFromPacked] 540 iterations done. 4741303618 characters processed.
 +
[bwt_gen] Finished constructing BWT in 543 iterations.
 +
[bwa_index] 2470.36 seconds elapse.
 +
[bwa_index] Update BWT... 129.37 sec
 +
[bwa_index] Pack forward-only FASTA... 302.45 sec
 +
[bwa_index] Construct SA from BWT and Occ... 1220.18 sec
 +
[main] Version: 0.7.12-r1039
 +
[main] CMD: bwa index unplaced.scaf.fa
 +
[main] Real time: 5398.987 sec; CPU: 4273.064 sec
 +
 
 +
The output files will be, in this case:
 +
 
 +
* unplaced.scaf.fa.amb, a text file
 +
* unplaced.scaf.fa.ann, a text file
 +
* unplaced.scaf.fa.bwt, a binary file
 +
* unplaced.scaf.fa.pac, a binary file
 +
* unplaced.scaf.fa.sa, a binary file
 +
 
 +
== Actual alignment step ==
  
 
  bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq
 
  bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq
  
 
In this case we have paired read-files. With single reads of course only one name would be required.
 
In this case we have paired read-files. With single reads of course only one name would be required.

Revision as of 17:21, 30 November 2016

Introduction

Heng Li's aligner.

Usage

As with samtools, bwa also went through some re-structuring, so that it has an old-style tw-step (aln and sam{s,p}e) usage, characterised by the following typical sequence of commands:

bwa index reference.fa
bwa aln -I -t 8 reference.fa s_1.txt > out.sai
bwa samse reference.fa out.sai s_1.txt > out.sam
samtools view -bSu out.sam | samtools sort -  out.sorted

And then a more modern usage which consists of just one step: bwa mem.

Indexing

Before alignment, indexing the reference is necessary. When bwa indexes a reference, it will use the whole filename and generate output index files with extensions added onto this name.

bwa index input_reference.fasta index_prefix

This index_prefix is then used for the actual alignment step.

Indexing may take a long time with large reference files. Here is some example output

[bwa_index] Pack FASTA... 150.69 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=4745806978, availableWord=345932252
[BWTIncConstructFromPacked] 10 iterations done. 99999986 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 199999986 characters processed.
.
.
.
[BWTIncConstructFromPacked] 530 iterations done. 4719199682 characters processed.
[BWTIncConstructFromPacked] 540 iterations done. 4741303618 characters processed.
[bwt_gen] Finished constructing BWT in 543 iterations.
[bwa_index] 2470.36 seconds elapse.
[bwa_index] Update BWT... 129.37 sec
[bwa_index] Pack forward-only FASTA... 302.45 sec
[bwa_index] Construct SA from BWT and Occ... 1220.18 sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa index unplaced.scaf.fa
[main] Real time: 5398.987 sec; CPU: 4273.064 sec

The output files will be, in this case:

  • unplaced.scaf.fa.amb, a text file
  • unplaced.scaf.fa.ann, a text file
  • unplaced.scaf.fa.bwt, a binary file
  • unplaced.scaf.fa.pac, a binary file
  • unplaced.scaf.fa.sa, a binary file

Actual alignment step

bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq

In this case we have paired read-files. With single reads of course only one name would be required.