Two Eel Scaffolds

From wiki
Revision as of 15:22, 13 May 2016 by Rf (talk | contribs)
Jump to: navigation, search

Introduction

Two DNA scaffolds are presented:

  1. eelScaffold32. 679 422 bp and 42.25% GC.
  2. eelScaffold320. 246 433 bp and 43.47% GC.

We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.

There are two genes expected to be around about the regions covered by these scaffolds:

  • eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
  • eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)

ORF Analysis

ORF scans for sequences over 100 kbp often throw up too much data, but it can be useful first step to see the complexity of the sequence.

Gene Predictor

One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.

An example Augustus command line is as follows:

augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf

Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:

gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff

We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:

getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf

Detecting presence of pdcd10b and nrd1a

We take the exons of these genes and apply Waterstone alignment with the scaffolds to them. Here is a table for