Difference between revisions of "Two Eel Scaffolds"

From wiki
Jump to: navigation, search
Line 25: Line 25:
 
  augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf
 
  augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf
  
 +
Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:
  
 +
gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff
  
Augustus outputs in the GTF format, for browsing we need to convert to the related format, GFF:
+
We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:
  
  gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff
+
  getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf

Revision as of 12:31, 13 May 2016

Introduction

Two DNA scaffolds are presented:

  1. eelScaffold32. 679 422 bp and 42.25% GC.
  2. eelScaffold320. 246 433 bp and 43.47% GC.

We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.

There are two genes expected to be around about the regions covered by these scaffolds:

  • eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
  • eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)

ORF Analysis

ORF scans for sequences over 100 kbp often throw up too much data, but it can be useful first step to see the complexity of the sequence.

Gene Predictor

One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.

An example Augustus command line is as follows:

augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf

Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:

gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff

We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:

getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf