Repeatmodeler

From wiki
Revision as of 11:38, 20 October 2017 by Rf (talk | contribs)
Jump to: navigation, search

Introduction

Usage

Here are the contents of the help manual:

$ RepeatModeler --help
No database indicated

NAME
    RepeatModeler - Model repetitive DNA

SYNOPSIS
      RepeatModeler [-options] -database <XDF Database>

DESCRIPTION
    The options are:

    -h(elp)
        Detailed help

    -database
        The prefix name of a XDF formatted sequence database containing the
        genomic sequence to use when building repeat models. The database
        may be created with the WUBlast "xdformat" utility or with the
        RepeatModeler wrapper script "BuildXDFDatabase".

    -engine <abblast|wublast|ncbi>
        The name of the search engine we are using. I.e abblast/wublast or
        ncbi (rmblast version).

    -pa #
        Specify the number of shared-memory processors available to this
        program. RepeatModeler will use the processors to run BLAST searches
        in parallel. i.e on a machine with 10 cores one might use 1 core for
        the script and 9 cores for the BLAST searches by running with "-pa
        9".

    -recoverDir <Previous Output Directory>
        If a run fails in the middle of processing, it may be possible
        recover some results and continue where the previous run left off.
        Simply supply the output directory where the results of the failed
        run were saved and the program will attempt to recover and continue
        the run.

    -srand #
        Optionally set the seed of the random number generator to a known
        value before the batches are randomly selected ( using Fisher Yates
        Shuffling ). This is only useful if you need to reproduce the sample
        choice between runs. This should be an integer number.

SEE ALSO
        RepeatMasker, WUBlast

COPYRIGHT
     Copyright 2005-2017 Institute for Systems Biology

AUTHOR
     Robert Hubley <rhubley@systemsbiology.org>
     Arian Smit <asmit@systemsbiology.org>

Installation notes

RepeatModeler has a number of dependencies. How these are installed are detailed below:


nseg

nseg and nmerge (included in nseg) are quite old programs coded in C by NCBI. These were installed on all machines in /usr/local/bin


RECON

The same is done with RECON, which consists of the following executables:

  • imagespread
  • eledef
  • eleredef
  • edgeredef
  • famdef

So all these are now available in /usr/local/bin in the nodes as well.

RepeatScout

As this also only includes two C-coded executables with quite good names, it is easy enough to install it locally on all the nodes:

  • RepeatScout
  • build_lmer_table

and also the following perl scripts (using the unusual *.prl extension) must be also be installed:

  • filter-stage-1.prl
  • filter-stage-2.prl
  • merge-lmer-tables.prl
  • compare-out-to-gff.prl

RepeatModeler itself

This is quite similar to RepeatMasker, although there is an opportunity to manually edit RepModelConfig.pm.tmp which is more accurate and convenient than the automatic configure script. Predictably, it will only be active when the tmp extension is hived off.