Difference between revisions of "SPAdes"
| Line 2: | Line 2: | ||
Pavel Pevszner's de-novo assembler, primarily for - but not restricted to - bacteria. | Pavel Pevszner's de-novo assembler, primarily for - but not restricted to - bacteria. | ||
| − | |||
= Usage = | = Usage = | ||
| + | The basic SPAdes run uses their python script (extension '''.py''') | ||
== Example qsub job script == | == Example qsub job script == | ||
| Line 33: | Line 33: | ||
spades.py -t $NSLOTS -o $ON --pe1-1 $R1 --pe1-2 $R2 | spades.py -t $NSLOTS -o $ON --pe1-1 $R1 --pe1-2 $R2 | ||
done | done | ||
| + | |||
| + | == Output == | ||
| + | |||
| + | The output directory defined in the SPAdes command line will contain the following key elements: | ||
| + | |||
| + | * the '''corrected''' subdirectory containing fastq reads corrected by BayesHammer. | ||
| + | * the '''contigs.fasta''' file containing the resulting contigs. | ||
| + | * the '''scaffolds.fasta''' file containing the resulting scaffolds. | ||
| + | * the '''assembly_graph.fastg''' file containing the SPAdes assembly graph in FASTG format | ||
| + | * the '''contigs.paths''' file containing paths in the assembly graph corresponding to '''contigs.fasta''' file mentioned above. | ||
| + | * the '''scaffolds.paths''' file: similar to '''contigs.path''' except with the scaffold paths as its name suggests. | ||
= Installation (Sysadmin notes)= | = Installation (Sysadmin notes)= | ||
Revision as of 16:47, 1 July 2016
Contents
Introduction
Pavel Pevszner's de-novo assembler, primarily for - but not restricted to - bacteria.
Usage
The basic SPAdes run uses their python script (extension .py)
Example qsub job script
#!/bin/bash
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#$ -q unstable.q
#$ -pe multi 16
# some quick "argument accounting"
EXPECTED_ARGS=1 # change value to suit!
if [ $# -ne $EXPECTED_ARGS ]; then
echo "error, this script should be fed with one argument: a filelist of fastq(.gz) files"
exit
fi
module load SPAdes
N=( $(cat $1) )
NSZ=${#N[@]}
for((i=2; i<NSZ; i+=2)); do
R1=${N[$i]}
R2=${N[$(($i+1))]}
ON=${N[$i]%%_*}
# echo "spades.py -t 6 -o $ON --pe1-1 $R1 --pe1-2 $R2"
spades.py -t $NSLOTS -o $ON --pe1-1 $R1 --pe1-2 $R2
done
Output
The output directory defined in the SPAdes command line will contain the following key elements:
- the corrected subdirectory containing fastq reads corrected by BayesHammer.
- the contigs.fasta file containing the resulting contigs.
- the scaffolds.fasta file containing the resulting scaffolds.
- the assembly_graph.fastg file containing the SPAdes assembly graph in FASTG format
- the contigs.paths file containing paths in the assembly graph corresponding to contigs.fasta file mentioned above.
- the scaffolds.paths file: similar to contigs.path except with the scaffold paths as its name suggests.
Installation (Sysadmin notes)
Initially version 3.7.0 was installed using the specially compiled gcc/4.9.3 compiler (available as a module). However the -b version of the module now uses Redhats devtoolset-2, so that this compiler is not necessary.
Boost however, is necessary. The cluster has the latest version: 1.60. Possibly compiled (well, the bits that can be compiled) with g++ 4.4.7. In any case, the location of boost is a problem, although the boost module on the cluster does create some useful environmental variables, the given stacks_compile script does recognise them.
In any case, the configure system is cmake, so a "build" subdirectory should be created. Inside that, a short compile script containined something the following should be created:
module load boost
cmake -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=.. -DBoost_NO_BOOST_CMAKE=TRUE -DBoost_NO_SYSTEM_PATHS=TRUE -DBOOST_ROOT:PATHNAME=${BOOST_ROOT} -DBoost_INCLUDE_DIRS:FILEPATH=${BOOST_INCLUDEDIR} -DBoost_LIBRARY_DIRS:FILEPATH=${BOOST_LIBRARYDIR} ../src
There is no make test nor make check. What there is, however, is a test script in the installation (not the source) directory, whihc can be invoked as follows:
<spades installation dir>/spades.py --test
or
<spades installation dir>/truspades.py --test
For the truspades modality.