Difference between revisions of "Samtools"

From wiki
Jump to: navigation, search
Line 23: Line 23:
  
 
<u>Explanation:</u>
 
<u>Explanation:</u>
* '''-S''', this says the input is SAM, though in the latest version sof samtools this si the default and so may be left out.
+
* '''-S''', this says the input is SAM, though in the latest versions of samtools this is the default and so may be left out.
 +
* '''-b''', refers to the output being BAM.
 +
* '''-o''', refers to output name
 +
 
 +
Sorting is often as easy as the following, and this is is typically the command-line you'll need:
  
Sorting can be as easy as
 
 
  samtools sort aln.bam aln.sorted
 
  samtools sort aln.bam aln.sorted
  

Revision as of 14:49, 9 November 2016

Introduction

Samtools hardly needs an introduction, it is one of the cornerstones of bioinformatics processing and is at the heart of the business of sequence mapping / aligning.

Despite that introduction, note that samtools does not actually carry out alignments itself. Rather it offers utilities attendant on real aligners such as bwa and bowtie.

Primarily this is due to its providing various tools (available as subcommands) centred on a well defined sequence alignment format, sam, and its binary (and therefore compressed) equivalent, bam.

This wiki page just offers some tips on how to use it, as there is plenty documentation elsewhere, some of which is mentioned in the links.

Tips

  1. samtools commands follow the subcommand style: i.e. they always start with a "samtools" followed by subcommand (eg. "sort", "view", etc) which identify the exact operation samtools will perform. This is due to the fact that samtools is a suite of tools for handling alignment files in sam/bam format.
  2. Two input files will commonly be needed, often a sam/bam file and also the reference file.
  3. view is a commonly used subcommand, whose name however refers to internal viewing, because external user visuals is not samtools strong point, though it has some capabilities in this regard ("tview").

Common commands

A bam file is the compressed binary version of a sam file. It exists purely for efficiency purposes and contains the same information. Because it is not human-readable, converting out of, but also back into bam, is very frequent activity. It is done via the view subcommand like so:

samtools view -S -b -o <your_chosen_bam_filename> <input_sam_filename>

Explanation:

  • -S, this says the input is SAM, though in the latest versions of samtools this is the default and so may be left out.
  • -b, refers to the output being BAM.
  • -o, refers to output name

Sorting is often as easy as the following, and this is is typically the command-line you'll need:

samtools sort aln.bam aln.sorted

Note how there are no options, after the first "samtools sort" we have the input filename and then output filename.

tview

While samtools does not prioritise visualisation, it is still abel to perform it. The tview subcommand does provide a raw view of the alignment that a bam files has. Sometimes such raw, and less pretty representations of the alignment can be useful.

It requires the bam file, which must be indexed and (often, sorted) and the reference sequence, and it is runs like so:

samtools tview alignments/sim_reads_aligned.sorted.bam genomes/NC_008253.fna

As Heng Li, the developer of samtools is an avid vi fan, the keybinding all reflect vi's keys so that l move you

Explanation:

  • NC_008253.fna is the reference file here, the original read file is not needed.
  • You can see here the point relating to tip no .2 above ... the two arguments are the bam file (first argument) and its associated reference (second argument) with no option switches.

Links

Installation Notes

Since version 1.2, samtools build structure has changed. Before it was monolithic, since 1.2 it is split into a lobrary, htslib and smatools proper. A third package, bcftools, is often mentioned in the same breath, though