Difference between revisions of "Detonate"

From wiki
Jump to: navigation, search
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Introduction =
 
= Introduction =
  
Quality evaulation of a de-novo transcriptome assembly from the creators of RSEM (Deweylab).
+
Quality evaulation of a de-novo transcriptome assembly from the creators of RSEM (Deweylab). It was designed to answer the shortcomings in the N50 score typically used to evaluate assembly. It is a much more comprehensive quality evaluation, requiring as input, no only the de-novo assembly, but all the raw reads that were used to assemble it.
  
 
= Layout =
 
= Layout =
Line 9: Line 9:
 
*# rsem-eval-calculate-score
 
*# rsem-eval-calculate-score
 
*# rsem-eval-estimate-transcript-length-distribution
 
*# rsem-eval-estimate-transcript-length-distribution
** rsem-plot-model
+
*# rsem-plot-model
** rsem-build-read-index
+
*# rsem-build-read-index
** rsem-eval-run-em
+
*# rsem-eval-run-em
** rsem-extract-reference-transcripts
+
*# rsem-extract-reference-transcripts
** rsem-parse-alignments
+
*# rsem-parse-alignments
** rsem-preref
+
*# rsem-preref
** rsem-sam-validator
+
*# rsem-sam-validator
** rsem-scan-for-paired-end-reads
+
*# rsem-scan-for-paired-end-reads
** rsem-simulate-reads
+
*# rsem-simulate-reads
 +
*# rsem-synthesis-reference-transcripts
  
 
* ref-eval
 
* ref-eval
 +
*# ref-eval
 +
*# ref-eval-estimate-true-assembly
 +
 +
= Usage =
 +
 +
The detonate module must be loaded beforehand
 +
 +
module load detonate
 +
 +
After de-novo assembly of your transcriptome, the first step is
 +
 +
rsem-eval-estimate-transcript-length-distribution <contigs.fasta> <outputld.txt>
 +
 +
<ins>Explanation</ins>:
 +
* rsem-eval-estimate-transcript-length-distribution, a perl script
 +
* <contigs.fasta>, your de-novo assembly
 +
* <outputld.txt>, your chosen name for the output text file which will hold the mean and SD of the contig lengths distribution.
 +
 +
Next the RSEM-EVAL score can be calculated. There is one executable for this, and it has various options. Executing
 +
 +
rsem-eval-calculate-score --help
 +
 +
will allow you view them.
 +
 +
The standard usage example is:
 +
 +
rsem-eval-calculate-score -p 8 --transcript-length-parameters human.txt /data/reads.fq assembly1.fa assembly1_rsem_eval 76
 +
 +
<ins>Explanation</ins>:
 +
There are two option with an associated prefix: '''-p''' and  '''--transcript-length-parameters'''. The rest are all positional
 +
* -p 8, this refers to the number of threads the program run will use. In this case 8.
 +
* --transcript-length-parameters, a filename contianing the output of the previous '''rsem-eval-estimate-transcript-length-distribution''' command.
 +
* The next option is a comma separated list of the raw reads used to assemble the de-novo transcriptome in the first place. It's probably best to build a list of these files beforehand, and pass them as a subprocess with<pre>&#10;$(cat fq.lst |tr '\n' ',')</pre>
 +
 +
* the next option sis hte de-novo assembled fasta file
 +
* the penultimate option in this example is the output prefix, a name (of the users choice) which will be used to prefix the output files.
 +
* the final option is a number representing the length of the raw reads.
 +
 +
It's best to wrap this command in a job submission script as follows:
 +
 +
#!/bin/bash
 +
#$ -V
 +
#$ -cwd
 +
#$ -j y
 +
#$ -S /bin/bash
 +
#$ -q highmemory.q
 +
#$ -pe multi 8
 +
module load detonate
 +
rsem-eval-calculate-score -p $NSLOTS --transcript-length-parameters outputld.txt $(cat fq.lst |tr '\n' ',') <denovoassemblyname.fa> <output_prefix> <length_of_short_reads>
 +
 +
=Links=
 +
* The [http://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0553-5 published paper] describing Detonate.
 +
* [https://github.com/deweylab/detonate source code webpage] with some usage instructions

Latest revision as of 14:57, 30 January 2017

Introduction

Quality evaulation of a de-novo transcriptome assembly from the creators of RSEM (Deweylab). It was designed to answer the shortcomings in the N50 score typically used to evaluate assembly. It is a much more comprehensive quality evaluation, requiring as input, no only the de-novo assembly, but all the raw reads that were used to assemble it.

Layout

Two main aspects to this program. Also included here are the executables associated

  • rsem-eval
    1. rsem-eval-calculate-score
    2. rsem-eval-estimate-transcript-length-distribution
    3. rsem-plot-model
    4. rsem-build-read-index
    5. rsem-eval-run-em
    6. rsem-extract-reference-transcripts
    7. rsem-parse-alignments
    8. rsem-preref
    9. rsem-sam-validator
    10. rsem-scan-for-paired-end-reads
    11. rsem-simulate-reads
    12. rsem-synthesis-reference-transcripts
  • ref-eval
    1. ref-eval
    2. ref-eval-estimate-true-assembly

Usage

The detonate module must be loaded beforehand

module load detonate

After de-novo assembly of your transcriptome, the first step is

rsem-eval-estimate-transcript-length-distribution <contigs.fasta> <outputld.txt>

Explanation:

  • rsem-eval-estimate-transcript-length-distribution, a perl script
  • <contigs.fasta>, your de-novo assembly
  • <outputld.txt>, your chosen name for the output text file which will hold the mean and SD of the contig lengths distribution.

Next the RSEM-EVAL score can be calculated. There is one executable for this, and it has various options. Executing

rsem-eval-calculate-score --help

will allow you view them.

The standard usage example is:

rsem-eval-calculate-score -p 8 --transcript-length-parameters human.txt /data/reads.fq assembly1.fa assembly1_rsem_eval 76

Explanation: There are two option with an associated prefix: -p and --transcript-length-parameters. The rest are all positional

  • -p 8, this refers to the number of threads the program run will use. In this case 8.
  • --transcript-length-parameters, a filename contianing the output of the previous rsem-eval-estimate-transcript-length-distribution command.
  • The next option is a comma separated list of the raw reads used to assemble the de-novo transcriptome in the first place. It's probably best to build a list of these files beforehand, and pass them as a subprocess with
    $(cat fq.lst |tr '\n' ',')
  • the next option sis hte de-novo assembled fasta file
  • the penultimate option in this example is the output prefix, a name (of the users choice) which will be used to prefix the output files.
  • the final option is a number representing the length of the raw reads.

It's best to wrap this command in a job submission script as follows:

#!/bin/bash
#$ -V
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -q highmemory.q
#$ -pe multi 8
module load detonate
rsem-eval-calculate-score -p $NSLOTS --transcript-length-parameters outputld.txt $(cat fq.lst |tr '\n' ',') <denovoassemblyname.fa> <output_prefix> <length_of_short_reads>

Links