Bwa

From wiki
Revision as of 16:21, 30 November 2016 by Rf (talk | contribs)
Jump to: navigation, search

Introduction

Heng Li's aligner.

Usage

As with samtools, bwa also went through some re-structuring, so that it has an old-style tw-step (aln and sam{s,p}e) usage, characterised by the following typical sequence of commands:

bwa index reference.fa
bwa aln -I -t 8 reference.fa s_1.txt > out.sai
bwa samse reference.fa out.sai s_1.txt > out.sam
samtools view -bSu out.sam | samtools sort -  out.sorted

And then a more modern usage which consists of just one step: bwa mem.

Indexing

Before alignment, indexing the reference is necessary. When bwa indexes a reference, it will use the whole filename and generate output index files with extensions added onto this name.

bwa index input_reference.fasta index_prefix

This index_prefix is then used for the actual alignment step.

Indexing may take a long time with large reference files. Here is some example output

[bwa_index] Pack FASTA... 150.69 sec
[bwa_index] Construct BWT for the packed sequence...
[BWTIncCreate] textLength=4745806978, availableWord=345932252
[BWTIncConstructFromPacked] 10 iterations done. 99999986 characters processed.
[BWTIncConstructFromPacked] 20 iterations done. 199999986 characters processed.
.
.
.
[BWTIncConstructFromPacked] 530 iterations done. 4719199682 characters processed.
[BWTIncConstructFromPacked] 540 iterations done. 4741303618 characters processed.
[bwt_gen] Finished constructing BWT in 543 iterations.
[bwa_index] 2470.36 seconds elapse.
[bwa_index] Update BWT... 129.37 sec
[bwa_index] Pack forward-only FASTA... 302.45 sec
[bwa_index] Construct SA from BWT and Occ... 1220.18 sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa index unplaced.scaf.fa
[main] Real time: 5398.987 sec; CPU: 4273.064 sec

The output files will be, in this case:

  • unplaced.scaf.fa.amb, a text file
  • unplaced.scaf.fa.ann, a text file
  • unplaced.scaf.fa.bwt, a binary file
  • unplaced.scaf.fa.pac, a binary file
  • unplaced.scaf.fa.sa, a binary file

Actual alignment step

bwa mem index_prefix input_reads_pair_1.fastq input_reads_pair_2.fastq

In this case we have paired read-files. With single reads of course only one name would be required.