Revision as of 15:00, 10 August 2016

Introduction

genome annotator for bacterial circular genomes.

Usage

Prokka's manual is here

Example jobscript for prokka

The folloiwng script will take the path and the name of the assembly file and run it in "fast" mode (specific CDS scan excluded) with 16 processes and wuill have all output files sent to a subdirectory named after the path to the fasta file with suffix "_prokka" added.

#!/bin/bash 
#$ -cwd 
#$ -j y
#$ -S /bin/bash 
#$ -V
#$ -q marvin.q
#$ -pe multi 16

# some quick "argument accounting"
EXPECTED_ARGS=1
if [ $# -ne $EXPECTED_ARGS ]; then
    echo "error, this script should be fed with one argument: the path and name of the contigs or scaffolds fasta file you want to annotate"
    exit
fi

module load prokka

DIR=${1%/*}_prokka
prokka $1 --fast --cpus $NSLOTS --outdir $DIR

prokka's standard help file

Name:
  Prokka 1.12-beta by Torsten Seemann <torsten.seemann@gmail.com>
Synopsis:
  rapid bacterial genome annotation
Usage:
  prokka [options] <contigs.fasta>
General:
  --help            This help
  --version         Print version and exit
  --docs            Show full manual/documentation
  --citation        Print citation for referencing Prokka
  --quiet           No screen output (default OFF)
  --debug           Debug mode: keep all temporary files (default OFF)
Setup:
  --listdb          List all configured databases
  --setupdb         Index all installed databases
  --cleandb         Remove all database indices
  --depends         List all software dependencies
Outputs:
  --outdir [X]      Output folder [auto] (default )
  --force           Force overwriting existing output folder (default OFF)
  --prefix [X]      Filename output prefix [auto] (default )
  --addgenes        Add 'gene' features for each 'CDS' feature (default OFF)
  --addmrna         Add 'mRNA' features for each 'CDS' feature (default OFF)
  --locustag [X]    Locus tag prefix (default 'PROKKA')
  --increment [N]   Locus tag counter increment (default '1')
  --gffver [N]      GFF version (default '3')
  --compliant       Force Genbank/ENA/DDJB compliance: --addgenes --mincontiglen 200 --centre XXX (default OFF)
  --centre [X]      Sequencing centre ID. (default )
Organism details:
  --genus [X]       Genus name (default 'Genus')
  --species [X]     Species name (default 'species')
  --strain [X]      Strain name (default 'strain')
  --plasmid [X]     Plasmid name or identifier (default )
Annotations:
  --kingdom [X]     Annotation mode: Archaea|Bacteria|Mitochondria|Viruses (default 'Bacteria')
  --gcode [N]       Genetic code / Translation table (set if --kingdom is set) (default '0')
  --gram [X]        Gram: -/neg +/pos (default )
  --usegenus        Use genus-specific BLAST databases (needs --genus) (default OFF)
  --proteins [X]    FASTA or GBK file to use as 1st priority (default )
  --hmms [X]        Trusted HMM to first annotate from (default )
  --metagenome      Improve gene predictions for highly fragmented genomes (default OFF)
  --rawproduct      Do not clean up /product annotation (default OFF)
  --cdsrnaolap      Allow [tr]RNA to overlap CDS (default OFF)
Computation:
  --cpus [N]        Number of CPUs to use [0=all] (default '8')
  --fast            Fast mode - only use basic BLASTP databases (default OFF)
  --noanno          For CDS just set /product="unannotated protein" (default OFF)
  --mincontiglen [N] Minimum contig size [NCBI needs 200] (default '1')
  --evalue [n.n]    Similarity e-value cut-off (default '1e-06')
  --rfam            Enable searching for ncRNAs with Infernal+Rfam (SLOW!) (default '0')
  --norrna          Don't run rRNA search (default OFF)
  --notrna          Don't run tRNA search (default OFF)
  --rnammer         Prefer RNAmmer over Barrnap for rRNA prediction (default OFF)

Output files

If given a fragmented scaffold file (typically from a de-novo assembler), prokka will refer to each scaffold / contigs as "nodes".

Installation issues (sysadmins only)

Prokka can be cloned from github and its first step is of setting up databases, like so:

> ./prokka --setupdb
[16:54:57] Appending to PATH: /home/nutria/gitrepos/prokka/bin/../binaries/linux
[16:54:57] Appending to PATH: /home/nutria/gitrepos/prokka/bin/../binaries/linux/../common
[16:54:57] Appending to PATH: /home/nutria/gitrepos/prokka/bin
[16:54:57] Cleaning databases in /home/nutria/gitrepos/prokka/bin/../db
[16:54:57] Cleaning complete.
[16:54:57] Looking for 'makeblastdb' - found /usr/bin/makeblastdb
[16:54:57] Determined makeblastdb version is 2.2
[16:54:57] Making kingdom BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/kingdom/Archaea/sprot
[16:54:57] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/kingdom\/Archaea\/sprot -logfile /dev/null
[16:54:58] Making kingdom BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/kingdom/Bacteria/sprot
[16:54:58] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/kingdom\/Bacteria\/sprot -logfile /dev/null
[16:54:59] Making kingdom BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/kingdom/Mitochondria/sprot
[16:54:59] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/kingdom\/Mitochondria\/sprot -logfile /dev/null
[16:54:59] Making kingdom BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/kingdom/Viruses/sprot
[16:54:59] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/kingdom\/Viruses\/sprot -logfile /dev/null
[16:54:59] Making genus BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/genus/Enterococcus
[16:54:59] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/genus\/Enterococcus -logfile /dev/null
[16:54:59] Making genus BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/genus/Escherichia
[16:54:59] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/genus\/Escherichia -logfile /dev/null
[16:55:00] Making genus BLASTP database: /home/nutria/gitrepos/prokka/bin/../db/genus/Staphylococcus
[16:55:00] Running: makeblastdb -hash_index -dbtype prot -in \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/genus\/Staphylococcus -logfile /dev/null
[16:55:00] Looking for 'hmmpress' - found /usr/bin/hmmpress
[16:55:00] Determined hmmpress version is 3.1
[16:55:00] Pressing HMM database: /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm
[16:55:00] Running: hmmpress \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/hmm\/HAMAP\.hmm
Working...    done.
Pressed and indexed 1463 HMMs (1463 names).
Models pressed into binary file:   /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm.h3m
SSI index for binary model file:   /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm.h3i
Profiles (MSV part) pressed into:  /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm.h3f
Profiles (remainder) pressed into: /home/nutria/gitrepos/prokka/bin/../db/hmm/HAMAP.hmm.h3p
[16:55:01] Looking for 'cmpress' - found /usr/bin/cmpress
[16:55:01] Determined cmpress version is 1.1
[16:55:01] Pressing CM database: /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses
[16:55:01] Running: cmpress \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/cm\/Viruses
Working...    done.
Pressed and indexed 142 CMs and p7 HMM filters (142 names and 142 accessions).
Covariance models and p7 filters pressed into binary file:  /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses.i1m
SSI index for binary covariance model file:                 /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses.i1i
Optimized p7 filter profiles (MSV part)  pressed into:      /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses.i1f
Optimized p7 filter profiles (remainder) pressed into:      /home/nutria/gitrepos/prokka/bin/../db/cm/Viruses.i1p
[16:55:01] Pressing CM database: /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria
[16:55:01] Running: cmpress \/home\/nutria\/gitrepos\/prokka\/bin\/\.\.\/db\/cm\/Bacteria
Working...    done.
Pressed and indexed 564 CMs and p7 HMM filters (564 names and 564 accessions).
Covariance models and p7 filters pressed into binary file:  /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria.i1m
SSI index for binary covariance model file:                 /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria.i1i
Optimized p7 filter profiles (MSV part)  pressed into:      /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria.i1f
Optimized p7 filter profiles (remainder) pressed into:      /home/nutria/gitrepos/prokka/bin/../db/cm/Bacteria.i1p
[16:55:01] Looking for databases in: /home/nutria/gitrepos/prokka/bin/../db
[16:55:01] * Kingdoms: Archaea Bacteria Mitochondria Viruses
[16:55:01] * Genera: Enterococcus Escherichia Staphylococcus
[16:55:01] * HMMs: HAMAP
[16:55:01] * CMs: Bacteria Viruses

it seems to set its own paths

When invoking prokka with no arguments, one sees this:

[ramon@marvin ~]$ prokka
[13:52:03] Appending to PATH: /usr/local/Modules/modulefiles/tools/prokka/gitv1_8f07048/bin/../binaries/linux
[13:52:03] Appending to PATH: /usr/local/Modules/modulefiles/tools/prokka/gitv1_8f07048/bin/../binaries/linux/../common
[13:52:03] Appending to PATH: /usr/local/Modules/modulefiles/tools/prokka/gitv1_8f07048/bin

Difference between revisions of "Prokka"

Revision as of 15:00, 10 August 2016

Contents

Introduction

Usage

Example jobscript for prokka

prokka's standard help file

Output files

Installation issues (sysadmins only)

it seems to set its own paths

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools