Introduction

Pavel Pevzner's de-novo assembler, is for prokaryotes and eukaryotes. It even assembles plasmids. It may at some time have expecially associated with prokaryote assembly.

I uses Bayeshammer^[1] to correct errors.

Usage

A basic SPAdes run for a pair of fastq's would use their python script (extension .py) in the following manner:

spades.py -o <output_directoryname> --pe1-1 <first_of_pair_fastq> --pe1-2 <second_of_pair_fastq>

NOTE: SPAdes' output files have generic names, so when running on several fastq samples it is essential that the output be directed to uniquely named directories.

Note that:

SPAdes has a nanopore option.
metaspades.py is for metagenome assembly. It is the same as spades.py --meta
SPAdes uses python as a wrapping tool, so it uses the system python and not bulked-up python in the modules system, due to its not needing any special libraries.

SPAdes can do error correction, using Bayeshammer^[2] though you might like to skip this if you have already trimmed data.

Outputs

SPAdes does no summary stas of its final assembly which is the scaffolds.fasta file. The outputs of a SPAdes run are:

Files:

assembly_graph.fastg
assembly_graph.gfa
before_rr.fasta
contigs.paths
dataset.info, simply refers to the yaml file.
first_pe_contigs.fasta
input_dataset.yaml, simply
params.txt
scaffolds.paths
spades.log
contigs.fasta, final contigs
scaffolds.fasta, final scaffolds, this means the final assembly, mostly. Preferred over the contigs.fasta.

Folders:

corrected
K21
K33
K55
tmp
misc

Example qsub job script

#!/bin/bash
#$ -cwd 
#$ -j y
#$ -S /bin/bash 
#$ -V
#$ -q unstable.q
#$ -pe multi 16

# some quick "argument accounting"
EXPECTED_ARGS=1 # change value to suit!
if [ $# -ne $EXPECTED_ARGS ]; then
    echo "error, this script should be fed with one argument: a filelist of fastq(.gz) files"
    exit
fi
module load SPAdes
N=( $(cat $1) )
NSZ=${#N[@]}
for((i=2; i<NSZ; i+=2)); do
    R1=${N[$i]}
    R2=${N[$(($i+1))]}
    ON=${N[$i]%%_*}
    # echo "spades.py -t 6 -o $ON --pe1-1 $R1 --pe1-2 $R2"
    spades.py -t $NSLOTS -o $ON --pe1-1 $R1 --pe1-2 $R2
done

Output

The output directory defined in the SPAdes command line will contain the following key elements:

the corrected subdirectory containing fastq reads corrected by BayesHammer.
the contigs.fasta file containing the resulting contigs.
the scaffolds.fasta file containing the resulting scaffolds.
the assembly_graph.fastg file containing the SPAdes assembly graph in FASTG format
the contigs.paths file containing paths in the assembly graph corresponding to contigs.fasta file mentioned above.
the scaffolds.paths file: similar to contigs.path except with the scaffold paths as its name suggests.

Installation (Sysadmin notes)

Initially version 3.7.0 was installed using the specially compiled gcc/4.9.3 compiler (available as a module). However the -b version of the module now uses Redhat's devtoolset-2, so that this compiler is not necessary.

Boost however, is necessary. The cluster has the latest version: 1.60. Possibly compiled (well, the bits that can be compiled) with g++ 4.4.7. In any case, the location of boost is a problem, although the boost module on the cluster does create some useful environmental variables, the given stacks_compile script does recognise them.

In any case, the configure system is cmake, so a "build" subdirectory should be created. Inside that, a short compile script containing something like the following should be created:

module load boost
cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=.. -DBoost_NO_BOOST_CMAKE=TRUE -DBoost_NO_SYSTEM_PATHS=TRUE -DBOOST_ROOT:PATHNAME=${BOOST_ROOT} -DBoost_INCLUDE_DIRS:FILEPATH=${BOOST_INCLUDEDIR} -DBoost_LIBRARY_DIRS:FILEPATH=${BOOST_LIBRARYDIR} ../src

There is no make test nor make check before installation. Post-installation, however, there is a test script in the installation (not the source) directory, whihc can be invoked as follows:

<spades installation dir>/spades.py --test

or

<spades installation dir>/truspades.py --test

For the truspades modality.

Links

Official manual

↑ http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-S1-S7

[1]

[2]

SPAdes

Contents

Introduction

Usage

Outputs

Example qsub job script

Output

Installation (Sysadmin notes)

Links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools