Repeatmodeler
Contents
Introduction
This is a sister program to RepeatMasker. It also has no publication behind it.
Usage
NB always use -engine ncbi when running RepastModeler. NB2 There is a warning like so in the output:
Use of uninitialized value $1 in string ne at /shelf/modulefiles/tools/RepeatModeler/1.0.10/SequenceSimilarityMatrix.pm line 273, <MATRIX> line 7
Because it is only a warning, it should not affetc the proper running of the program. In any case, it will be repaired shortly.
Here are the contents of the help manual:
$ RepeatModeler --help
No database indicated
NAME
RepeatModeler - Model repetitive DNA
SYNOPSIS
RepeatModeler [-options] -database <XDF Database>
DESCRIPTION
The options are:
-h(elp)
Detailed help
-database
The prefix name of a XDF formatted sequence database containing the
genomic sequence to use when building repeat models. The database
may be created with the WUBlast "xdformat" utility or with the
RepeatModeler wrapper script "BuildXDFDatabase".
-engine <abblast|wublast|ncbi>
The name of the search engine we are using. I.e abblast/wublast or
ncbi (rmblast version).
-pa #
Specify the number of shared-memory processors available to this
program. RepeatModeler will use the processors to run BLAST searches
in parallel. i.e on a machine with 10 cores one might use 1 core for
the script and 9 cores for the BLAST searches by running with "-pa
9".
-recoverDir <Previous Output Directory>
If a run fails in the middle of processing, it may be possible
recover some results and continue where the previous run left off.
Simply supply the output directory where the results of the failed
run were saved and the program will attempt to recover and continue
the run.
-srand #
Optionally set the seed of the random number generator to a known
value before the batches are randomly selected ( using Fisher Yates
Shuffling ). This is only useful if you need to reproduce the sample
choice between runs. This should be an integer number.
SEE ALSO
RepeatMasker, WUBlast
COPYRIGHT
Copyright 2005-2017 Institute for Systems Biology
AUTHOR
Robert Hubley <rhubley@systemsbiology.org>
Arian Smit <asmit@systemsbiology.org>
Installation notes
RepeatModeler has a number of dependencies. How these are installed are detailed below:
nseg
nseg and nmerge (included in nseg) are quite old programs coded in C by NCBI. These were installed on all machines in /usr/local/bin
RECON
The same is done with RECON, which consists of the following executables:
- imagespread
- eledef
- eleredef
- edgeredef
- famdef
So all these are now available in /usr/local/bin in the nodes as well.
RepeatScout
As this also only includes two C-coded executables with quite good names, it is easy enough to install it locally on all the nodes:
- RepeatScout
- build_lmer_table
and also the following perl scripts (using the unusual *.prl extension) must be also be installed:
- filter-stage-1.prl
- filter-stage-2.prl
- merge-lmer-tables.prl
- compare-out-to-gff.prl
RepeatModeler itself
This is quite similar to RepeatMasker, although there is an opportunity to manually edit RepModelConfig.pm.tmp which is more accurate and convenient than the automatic configure script. Predictably, it will only be active when the tmp extension is hived off.