Difference between revisions of "SPAdes"
Line 19: | Line 19: | ||
* <tt>metaspades.py</tt> is for metagenome assembly. It is the same as <tt>spades.py --meta</tt> | * <tt>metaspades.py</tt> is for metagenome assembly. It is the same as <tt>spades.py --meta</tt> | ||
* SPAdes uses python as a wrapping tool, so it uses the system python and not bulked-up python in the modules system, due to its not needing any special libraries. | * SPAdes uses python as a wrapping tool, so it uses the system python and not bulked-up python in the modules system, due to its not needing any special libraries. | ||
+ | |||
+ | SPAdes can do error correction, using Bayeshammer<ref>http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-S1-S7</ref> though you might like to skip this if you have already trimmed data. | ||
+ | |||
== Outputs == | == Outputs == |
Revision as of 15:16, 10 February 2017
Contents
Introduction
Pavel Pevzner's de-novo assembler, primarily for - but not restricted to - prokaryotes.
I uses Bayeshammer[1] to correct errors.
Usage
A basic SPAdes run for a pair of fastq's would use their python script (extension .py) in the following manner:
spades.py -o <output_directoryname> --pe1-1 <first_of_pair_fastq> --pe1-2 <second_of_pair_fastq>
NOTE: SPAdes' output files have generic names, so when running on several fastq samples it is essential that the output be directed to uniquely named directories.
Note that:
- SPAdes has a nanopore option.
- metaspades.py is for metagenome assembly. It is the same as spades.py --meta
- SPAdes uses python as a wrapping tool, so it uses the system python and not bulked-up python in the modules system, due to its not needing any special libraries.
SPAdes can do error correction, using Bayeshammer[2] though you might like to skip this if you have already trimmed data.
Outputs
The outputs from a SPAdes run:
Files:
- assembly_graph.fastg
- assembly_graph.gfa
- before_rr.fasta
- contigs.paths
- dataset.info, simply refers to the yaml file.
- first_pe_contigs.fasta
- input_dataset.yaml, simply
- params.txt
- scaffolds.paths
- spades.log
- contigs.fasta, final contigs
- scaffolds.fasta, final scaffolds, this means the final assembly, mostly. Preferred over the contigs.fasta.
Folders:
- corrected
- K21
- K33
- K55
- tmp
- misc
Example qsub job script
#!/bin/bash #$ -cwd #$ -j y #$ -S /bin/bash #$ -V #$ -q unstable.q #$ -pe multi 16 # some quick "argument accounting" EXPECTED_ARGS=1 # change value to suit! if [ $# -ne $EXPECTED_ARGS ]; then echo "error, this script should be fed with one argument: a filelist of fastq(.gz) files" exit fi module load SPAdes N=( $(cat $1) ) NSZ=${#N[@]} for((i=2; i<NSZ; i+=2)); do R1=${N[$i]} R2=${N[$(($i+1))]} ON=${N[$i]%%_*} # echo "spades.py -t 6 -o $ON --pe1-1 $R1 --pe1-2 $R2" spades.py -t $NSLOTS -o $ON --pe1-1 $R1 --pe1-2 $R2 done
Output
The output directory defined in the SPAdes command line will contain the following key elements:
- the corrected subdirectory containing fastq reads corrected by BayesHammer.
- the contigs.fasta file containing the resulting contigs.
- the scaffolds.fasta file containing the resulting scaffolds.
- the assembly_graph.fastg file containing the SPAdes assembly graph in FASTG format
- the contigs.paths file containing paths in the assembly graph corresponding to contigs.fasta file mentioned above.
- the scaffolds.paths file: similar to contigs.path except with the scaffold paths as its name suggests.
Installation (Sysadmin notes)
Initially version 3.7.0 was installed using the specially compiled gcc/4.9.3 compiler (available as a module). However the -b version of the module now uses Redhat's devtoolset-2, so that this compiler is not necessary.
Boost however, is necessary. The cluster has the latest version: 1.60. Possibly compiled (well, the bits that can be compiled) with g++ 4.4.7. In any case, the location of boost is a problem, although the boost module on the cluster does create some useful environmental variables, the given stacks_compile script does recognise them.
In any case, the configure system is cmake, so a "build" subdirectory should be created. Inside that, a short compile script containing something like the following should be created:
module load boost cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=.. -DBoost_NO_BOOST_CMAKE=TRUE -DBoost_NO_SYSTEM_PATHS=TRUE -DBOOST_ROOT:PATHNAME=${BOOST_ROOT} -DBoost_INCLUDE_DIRS:FILEPATH=${BOOST_INCLUDEDIR} -DBoost_LIBRARY_DIRS:FILEPATH=${BOOST_LIBRARYDIR} ../src
There is no make test nor make check before installation. Post-installation, however, there is a test script in the installation (not the source) directory, whihc can be invoked as follows:
<spades installation dir>/spades.py --test
or
<spades installation dir>/truspades.py --test
For the truspades modality.