Difference between revisions of "BUSCO"

From wiki
Jump to: navigation, search
Line 19: Line 19:
 
# These predicted genes, or all genes from an annotated gene set or transcriptome, are assessed using HMMER and lineage-­  
 
# These predicted genes, or all genes from an annotated gene set or transcriptome, are assessed using HMMER and lineage-­  
 
specific  BUSCO  profiles  to  classify  matches  as  "complete", "duplicated", or "fragmented", or when there are no matches, as "missing".
 
specific  BUSCO  profiles  to  classify  matches  as  "complete", "duplicated", or "fragmented", or when there are no matches, as "missing".
 +
 +
After the first blastn, augustus is invoked as follows:
 +
 +
augustus --proteinprofile=example/prfl/BUSCO_7.prfl --predictionStart=163394 --predictionEnd=174110 --species=fly "sampleasroo2_.temp" > ./run_asroo2//augustus/BUSCO_7.out.1 2>/dev/null
  
 
= Using =
 
= Using =

Revision as of 12:41, 27 July 2016

Introduction

BUSCO, like Cegma, is a special tool for the field of "completeness assessment". This concerns genome assemblies, particularly ones generated de-novo, when by concentrating on a core set of genes, one can estimate how complete the assembly is by the number of the these core geens that the assembly has managed to recover.

BUSCO actually stands for "Benchmarking Universal Single-­Copy Orthologs" and labels itself as a quality measure of the assembly. Busco also means "I search" in the Spanish, Galician and Portuguese languages, in which the authors find satisfaction, as the broad goal of the tool is one of a quest for quality.

Aspects

BUSCO can work closely with augustus, even as far as undertaking retraining (for a species). However, take note:

"Write access to the Augustus installation directory is necessary for retraining the gene finder", so retraining is probably best carried out by the sysadmin under "root" user.

BUSCO is primarily a python application and though it can, apparently, work with version 2 of python, version 3 is recommended.

The broad BUSCO process is as follows:

  1. identification of candidate regions from the genome to be assessed with tBLASTn searches using BUSCO consensus sequences.
  2. Gene structure prediction using Augustus with BUSCO block profiles.
  3. These predicted genes, or all genes from an annotated gene set or transcriptome, are assessed using HMMER and lineage-­

specific BUSCO profiles to classify matches as "complete", "duplicated", or "fragmented", or when there are no matches, as "missing".

After the first blastn, augustus is invoked as follows:

augustus --proteinprofile=example/prfl/BUSCO_7.prfl --predictionStart=163394 --predictionEnd=174110 --species=fly "sampleasroo2_.temp" > ./run_asroo2//augustus/BUSCO_7.out.1 2>/dev/null

Using

loading the module

module load BUSCO

is enough, as all BUSCO's dependencies (python/3.4, augustus/3.2.2, hmmer/3.1b2, EMBOSS/6.6.0) will also be loaded at the same time.

The main BUSCO executable is a python script called

BUSCO_v1.22.py

However, there is a symlink to this called BUSCO, so the program can equally well be launched with a simple

BUSCO

modes

BUSCO has the following three modes

  1. Genome assembly assessment
  2. Transcriptome assembly assessment
  3. Gene set assessment

Error appearance

There is a certain error, which, because it only refers to the retaining operation, can be safely ignored (unless when wanting to retrain of course)

Error: Cannot write to Augustus directory, please make sure you have write permissions to /usr/local/Modules/modulefiles/tools/augustus/3.2.2/config