Difference between revisions of "Srst2"

From wiki
Jump to: navigation, search
Line 15: Line 15:
 
This proves to be robust unlike the alternative of assembling the genomes ''de-novo'', which, when dealing with 100 or 1000 bacterial genomes (a typical workload in bacteria), is a major issue.
 
This proves to be robust unlike the alternative of assembling the genomes ''de-novo'', which, when dealing with 100 or 1000 bacterial genomes (a typical workload in bacteria), is a major issue.
  
 +
= Example command lines=
 +
 +
srst2 --input_pe ERR024070*.fastq.gz --output shigella1 --log --save_scores --mlst_db Escherichia_coli#1.fasta --mlst_definitions ecoli.txt --gene_db ARGannot.fasta
 +
 +
Options as follows:
 +
* '''--mlst_db''' this is for specifying the database, usually a fasta file.
  
  

Revision as of 14:11, 28 April 2016

Introduction

This python based tool with two main dependencies (samtools and bowtie2) carries out mapping of short reads to detect overall three targets:

  1. genes
  2. alleles
  3. multi-locus sequence types (MLST)

from WGS data (which we can take to be NGS short reads).

A smaller number of loci, i.e. 7, are used to divide the population.

These loci come in the shape of entire housekeeping genes that all the species and isolates are bound to have.

This proves to be robust unlike the alternative of assembling the genomes de-novo, which, when dealing with 100 or 1000 bacterial genomes (a typical workload in bacteria), is a major issue.

Example command lines

srst2 --input_pe ERR024070*.fastq.gz --output shigella1 --log --save_scores --mlst_db Escherichia_coli#1.fasta --mlst_definitions ecoli.txt --gene_db ARGannot.fasta

Options as follows:

  • --mlst_db this is for specifying the database, usually a fasta file.


Glossary

  • MLST: MultiLocus Sequence Type