Difference between revisions of "Prokka"

From wiki
Jump to: navigation, search
Line 7: Line 7:
  
 
Prokka's manual is [http://bioinformatics.net.au/prokka-manual.html here]
 
Prokka's manual is [http://bioinformatics.net.au/prokka-manual.html here]
 
  
 
== Example jobscript for prokka ==
 
== Example jobscript for prokka ==
Line 77: Line 76:
 
   --notrna          Don't run tRNA search (default OFF)
 
   --notrna          Don't run tRNA search (default OFF)
 
   --rnammer        Prefer RNAmmer over Barrnap for rRNA prediction (default OFF)
 
   --rnammer        Prefer RNAmmer over Barrnap for rRNA prediction (default OFF)
 +
 +
= Output files =
 +
 +
* If given a fragmented scaffold file (typically from a de-novo assembler), prokka will refer to each scaffold / contigs as "nodes".
  
  

Revision as of 14:09, 10 August 2016

Introduction

genome annotator for bacterial circular genomes.


Usage

Prokka's manual is here

Example jobscript for prokka

#!/bin/bash 
#$ -cwd 
#$ -j y
#$ -S /bin/bash 
#$ -V
#$ -q marvin.q
#$ -pe multi 16
DIR=$
prokka --fast --cpus $NSLOTS --outdir 

prokka's standard help file

Name:
  Prokka 1.12-beta by Torsten Seemann <torsten.seemann@gmail.com>
Synopsis:
  rapid bacterial genome annotation
Usage:
  prokka [options] <contigs.fasta>
General:
  --help            This help
  --version         Print version and exit
  --docs            Show full manual/documentation
  --citation        Print citation for referencing Prokka
  --quiet           No screen output (default OFF)
  --debug           Debug mode: keep all temporary files (default OFF)
Setup:
  --listdb          List all configured databases
  --setupdb         Index all installed databases
  --cleandb         Remove all database indices
  --depends         List all software dependencies
Outputs:
  --outdir [X]      Output folder [auto] (default )
  --force           Force overwriting existing output folder (default OFF)
  --prefix [X]      Filename output prefix [auto] (default )
  --addgenes        Add 'gene' features for each 'CDS' feature (default OFF)
  --addmrna         Add 'mRNA' features for each 'CDS' feature (default OFF)
  --locustag [X]    Locus tag prefix (default 'PROKKA')
  --increment [N]   Locus tag counter increment (default '1')
  --gffver [N]      GFF version (default '3')
  --compliant       Force Genbank/ENA/DDJB compliance: --addgenes --mincontiglen 200 --centre XXX (default OFF)
  --centre [X]      Sequencing centre ID. (default )
Organism details:
  --genus [X]       Genus name (default 'Genus')
  --species [X]     Species name (default 'species')
  --strain [X]      Strain name (default 'strain')
  --plasmid [X]     Plasmid name or identifier (default )
Annotations:
  --kingdom [X]     Annotation mode: Archaea|Bacteria|Mitochondria|Viruses (default 'Bacteria')
  --gcode [N]       Genetic code / Translation table (set if --kingdom is set) (default '0')
  --gram [X]        Gram: -/neg +/pos (default )
  --usegenus        Use genus-specific BLAST databases (needs --genus) (default OFF)
  --proteins [X]    FASTA or GBK file to use as 1st priority (default )
  --hmms [X]        Trusted HMM to first annotate from (default )
  --metagenome      Improve gene predictions for highly fragmented genomes (default OFF)
  --rawproduct      Do not clean up /product annotation (default OFF)
  --cdsrnaolap      Allow [tr]RNA to overlap CDS (default OFF)
Computation:
  --cpus [N]        Number of CPUs to use [0=all] (default '8')
  --fast            Fast mode - only use basic BLASTP databases (default OFF)
  --noanno          For CDS just set /product="unannotated protein" (default OFF)
  --mincontiglen [N] Minimum contig size [NCBI needs 200] (default '1')
  --evalue [n.n]    Similarity e-value cut-off (default '1e-06')
  --rfam            Enable searching for ncRNAs with Infernal+Rfam (SLOW!) (default '0')
  --norrna          Don't run rRNA search (default OFF)
  --notrna          Don't run tRNA search (default OFF)
  --rnammer         Prefer RNAmmer over Barrnap for rRNA prediction (default OFF)

Output files

  • If given a fragmented scaffold file (typically from a de-novo assembler), prokka will refer to each scaffold / contigs as "nodes".


Installation issues (sysadmins only)

Prokka can be cloned from github and its first step is of setting up databases, like so:

> ./prokka --setupdb
[16:54:57] Appending to PATH: /home/nutria/gitrepos/prokka/bin/../binaries/linux
[16:54:57] Appending to PATH: /home/nutria/gitrepos/prokka/bin/../binaries/linux/../common
[16:54:57] Appending to PATH: /home/nutria/gitrepos/prokka/bin
[16:54:57] Cleaning databases in /home/nutria/gitrepos/prokka/bin/../db
[16:54:57] Cleaning complete.
[16:54:57] Looking for 'makeblastdb' - found /usr/bin/makeblastdb
[16:54:57] Determined makeblastdb version is 2.2
[16:54:57] Making kingdom BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/kingdom/Archaea/sprot
[16:54:57] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/kingdom\/Archaea\/sprot -logfile /dev/null
[16:54:58] Making kingdom BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/kingdom/Bacteria/sprot
[16:54:58] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/kingdom\/Bacteria\/sprot -logfile /dev/null
[16:54:59] Making kingdom BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/kingdom/Mitochondria/sprot
[16:54:59] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/kingdom\/Mitochondria\/sprot -logfile /dev/null
[16:54:59] Making kingdom BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/kingdom/Viruses/sprot
[16:54:59] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/kingdom\/Viruses\/sprot -logfile /dev/null
[16:54:59] Making genus BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/genus/Enterococcus
[16:54:59] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/genus\/Enterococcus -logfile /dev/null
[16:54:59] Making genus BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/genus/Escherichia
[16:54:59] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/genus\/Escherichia -logfile /dev/null
[16:55:00] Making genus BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/genus/Staphylococcus
[16:55:00] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/genus\/Staphylococcus -logfile /dev/null
[16:55:00] Looking for 'hmmpress' - found /usr/bin/hmmpress
[16:55:00] Determined hmmpress version is 3.1
[16:55:00] Pressing HMM database: /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm
[16:55:00] Running: hmmpress \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/hmm\/HAMAP\.hmm
Working...    done.
Pressed and indexed 1463 HMMs (1463 names).
Models pressed into binary file:   /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm.h3m
SSI index for binary model file:   /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm.h3i
Profiles (MSV part) pressed into:  /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm.h3f
Profiles (remainder) pressed into: /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm.h3p
[16:55:01] Looking for 'cmpress' - found /usr/bin/cmpress
[16:55:01] Determined cmpress version is 1.1
[16:55:01] Pressing CM database: /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses
[16:55:01] Running: cmpress \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/cm\/Viruses
Working...    done.
Pressed and indexed 142 CMs and p7 HMM filters (142 names and 142 accessions).
Covariance models and p7 filters pressed into binary file:  /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses.i1m
SSI index for binary covariance model file:                 /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses.i1i
Optimized p7 filter profiles (MSV part)  pressed into:      /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses.i1f
Optimized p7 filter profiles (remainder) pressed into:      /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses.i1p
[16:55:01] Pressing CM database: /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria
[16:55:01] Running: cmpress \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/cm\/Bacteria
Working...    done.
Pressed and indexed 564 CMs and p7 HMM filters (564 names and 564 accessions).
Covariance models and p7 filters pressed into binary file:  /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria.i1m
SSI index for binary covariance model file:                 /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria.i1i
Optimized p7 filter profiles (MSV part)  pressed into:      /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria.i1f
Optimized p7 filter profiles (remainder) pressed into:      /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria.i1p
[16:55:01] Looking for databases in: /home/nutria/gitrepos/prokka/bin/../db
[16:55:01] * Kingdoms: Archaea Bacteria Mitochondria Viruses
[16:55:01] * Genera: Enterococcus Escherichia Staphylococcus
[16:55:01] * HMMs: HAMAP
[16:55:01] * CMs: Bacteria Viruses

it seems to set its own paths

When invoking prokka with no arguments, one sees this:

[ramon@marvin ~]$ prokka
[13:52:03] Appending to PATH: /usr/local/Modules/modulefiles/tools/prokka/gitv1_8f07048/bin/../binaries/linux
[13:52:03] Appending to PATH: /usr/local/Modules/modulefiles/tools/prokka/gitv1_8f07048/bin/../binaries/linux/../common
[13:52:03] Appending to PATH: /usr/local/Modules/modulefiles/tools/prokka/gitv1_8f07048/bin