Difference between revisions of "Two Eel Scaffolds"
Line 25: | Line 25: | ||
augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf | augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf | ||
+ | Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF: | ||
+ | gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff | ||
− | + | We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script: | |
− | + | getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf |
Revision as of 12:31, 13 May 2016
Introduction
Two DNA scaffolds are presented:
- eelScaffold32. 679 422 bp and 42.25% GC.
- eelScaffold320. 246 433 bp and 43.47% GC.
We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.
There are two genes expected to be around about the regions covered by these scaffolds:
- eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
- eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)
ORF Analysis
ORF scans for sequences over 100 kbp often throw up too much data, but it can be useful first step to see the complexity of the sequence.
Gene Predictor
One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.
An example Augustus command line is as follows:
augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf
Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:
gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff
We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:
getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf