Difference between revisions of "Two Eel Scaffolds"
Line 20: | Line 20: | ||
One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish ('''zb''') and Lamprey | One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish ('''zb''') and Lamprey | ||
('''lp''') which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism. | ('''lp''') which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism. | ||
+ | |||
+ | An example Augustus command line is as follows: | ||
+ | |||
+ | augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf | ||
+ | |||
+ | |||
Augustus outputs in the GTF format, for browsing we need to convert to the related format, GFF: | Augustus outputs in the GTF format, for browsing we need to convert to the related format, GFF: | ||
gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff | gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff |
Revision as of 11:56, 13 May 2016
Introduction
Two DNA scaffolds are presented:
- eelScaffold32. 679 422 bp and 42.25% GC.
- eelScaffold320. 246 433 bp and 43.47% GC.
We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.
There are two genes expected to be around about the regions covered by these scaffolds:
- eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
- eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)
ORF Analysis
ORF scans for sequences over 100 kbp often throw up too much data, but it can be useful first step to see the complexity of the sequence.
Gene Predictor
One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.
An example Augustus command line is as follows:
augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf
Augustus outputs in the GTF format, for browsing we need to convert to the related format, GFF:
gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff