From wiki
Jump to: navigation, search


Pavel Pevzner's de-novo assembler, is for prokaryotes and eukaryotes. It even assembles plasmids. It may at some time have expecially associated with prokaryote assembly.

I uses Bayeshammer[1] to correct errors.


A basic SPAdes run for a pair of fastq's would use their python script (extension .py) in the following manner: -o <output_directoryname> --pe1-1 <first_of_pair_fastq> --pe1-2 <second_of_pair_fastq>

NOTE: SPAdes' output files have generic names, so when running on several fastq samples it is essential that the output be directed to uniquely named directories.

Note that:

  • SPAdes has a nanopore option.
  • is for metagenome assembly. It is the same as --meta
  • SPAdes uses python as a wrapping tool, so it uses the system python and not bulked-up python in the modules system, due to its not needing any special libraries.

SPAdes can do error correction, using Bayeshammer[2] though you might like to skip this if you have already trimmed data.


SPAdes does no summary stas of its final assembly which is the scaffolds.fasta file. The outputs of a SPAdes run are:


  • assembly_graph.fastg
  • assembly_graph.gfa
  • before_rr.fasta
  • contigs.paths
  •, simply refers to the yaml file.
  • first_pe_contigs.fasta
  • input_dataset.yaml, simply
  • params.txt
  • scaffolds.paths
  • spades.log
  • contigs.fasta, final contigs
  • scaffolds.fasta, final scaffolds, this means the final assembly, mostly. Preferred over the contigs.fasta.


  • corrected
  • K21
  • K33
  • K55
  • tmp
  • misc

Example qsub job script

#$ -cwd 
#$ -j y
#$ -S /bin/bash 
#$ -V
#$ -q unstable.q
#$ -pe multi 16

# some quick "argument accounting"
EXPECTED_ARGS=1 # change value to suit!
if [ $# -ne $EXPECTED_ARGS ]; then
    echo "error, this script should be fed with one argument: a filelist of fastq(.gz) files"
module load SPAdes
N=( $(cat $1) )
for((i=2; i<NSZ; i+=2)); do
    # echo " -t 6 -o $ON --pe1-1 $R1 --pe1-2 $R2" -t $NSLOTS -o $ON --pe1-1 $R1 --pe1-2 $R2


The output directory defined in the SPAdes command line will contain the following key elements:

  • the corrected subdirectory containing fastq reads corrected by BayesHammer.
  • the contigs.fasta file containing the resulting contigs.
  • the scaffolds.fasta file containing the resulting scaffolds.
  • the assembly_graph.fastg file containing the SPAdes assembly graph in FASTG format
  • the contigs.paths file containing paths in the assembly graph corresponding to contigs.fasta file mentioned above.
  • the scaffolds.paths file: similar to contigs.path except with the scaffold paths as its name suggests.

Installation (Sysadmin notes)

Initially version 3.7.0 was installed using the specially compiled gcc/4.9.3 compiler (available as a module). However the -b version of the module now uses Redhat's devtoolset-2, so that this compiler is not necessary.

Boost however, is necessary. The cluster has the latest version: 1.60. Possibly compiled (well, the bits that can be compiled) with g++ 4.4.7. In any case, the location of boost is a problem, although the boost module on the cluster does create some useful environmental variables, the given stacks_compile script does recognise them.

In any case, the configure system is cmake, so a "build" subdirectory should be created. Inside that, a short compile script containing something like the following should be created:

module load boost

There is no make test nor make check before installation. Post-installation, however, there is a test script in the installation (not the source) directory, whihc can be invoked as follows:

<spades installation dir>/ --test


<spades installation dir>/ --test

For the truspades modality.


  • Retrieved from ""