Quality evaulation of a de-novo transcriptome assembly from the creators of RSEM (Deweylab). It was designed to answer the shortcomings in the N50 score typically used to evaluate assembly. It is a much more comprehensive quality evaluation, requiring as input, no only the de-novo assembly, but all the raw reads that were used to assemble it.
Two main aspects to this program. Also included here are the executables associated
The detonate module must be loaded beforehand
module load detonate
After de-novo assembly of your transcriptome, the first step is
rsem-eval-estimate-transcript-length-distribution <contigs.fasta> <outputld.txt>
- rsem-eval-estimate-transcript-length-distribution, a perl script
- <contigs.fasta>, your de-novo assembly
- <outputld.txt>, your chosen name for the output text file which will hold the mean and SD of the contig lengths distribution.
Next the RSEM-EVAL score can be calculated. There is one executable for this, and it has various options. Executing
will allow you view them.
The standard usage example is:
rsem-eval-calculate-score -p 8 --transcript-length-parameters human.txt /data/reads.fq assembly1.fa assembly1_rsem_eval 76
: There are two option with an associated prefix: -p and --transcript-length-parameters. The rest are all positional
- -p 8, this refers to the number of threads the program run will use. In this case 8.
- --transcript-length-parameters, a filename contianing the output of the previous rsem-eval-estimate-transcript-length-distribution command.
- The next option is a comma separated list of the raw reads used to assemble the de-novo transcriptome in the first place. It's probably best to build a list of these files beforehand, and pass them as a subprocess with
$(cat fq.lst |tr '\n' ',')
- the next option sis hte de-novo assembled fasta file
- the penultimate option in this example is the output prefix, a name (of the users choice) which will be used to prefix the output files.
- the final option is a number representing the length of the raw reads.
It's best to wrap this command in a job submission script as follows:
#!/bin/bash #$ -V #$ -cwd #$ -j y #$ -S /bin/bash #$ -q highmemory.q #$ -pe multi 8 module load detonate rsem-eval-calculate-score -p $NSLOTS --transcript-length-parameters outputld.txt $(cat fq.lst |tr '\n' ',') <denovoassemblyname.fa> <output_prefix> <length_of_short_reads>