Difference between revisions of "Two Eel Scaffolds"

From wiki
Jump to: navigation, search
Line 11: Line 11:
 
* '''eelScaffold32''' contains any part of [http://www.ensembl.org/Oreochromis_niloticus/Gene/Summary?g=ENSONIG00000001157&db=core PDCD10b] (Programmed cell death 10b).
 
* '''eelScaffold32''' contains any part of [http://www.ensembl.org/Oreochromis_niloticus/Gene/Summary?g=ENSONIG00000001157&db=core PDCD10b] (Programmed cell death 10b).
 
* '''eelScaffold320''' contains any part of [http://www.ensembl.org/Oreochromis_niloticus/Gene/Summary?g=ENSONIG00000001167&db=core nrd1a] (Nardilysin, N-arginine dibasic convertase)
 
* '''eelScaffold320''' contains any part of [http://www.ensembl.org/Oreochromis_niloticus/Gene/Summary?g=ENSONIG00000001167&db=core nrd1a] (Nardilysin, N-arginine dibasic convertase)
 
= ORF Analysis =
 
 
ORF scans for sequences over 100 kbp often throw up too much data, but it can be useful first step to see the complexity of the sequence.
 
  
 
= Gene Predictor =
 
= Gene Predictor =

Revision as of 15:41, 13 May 2016

Introduction

Two DNA scaffolds are presented:

  1. eelScaffold32. 679 422 bp and 42.25% GC.
  2. eelScaffold320. 246 433 bp and 43.47% GC.

We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.

There are two genes expected to be around about the regions covered by these scaffolds:

  • eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
  • eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)

Gene Predictor

One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.

An example Augustus command line is as follows:

augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf

Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:

gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff

We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:

getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf

In order to visually navigate the results of these annotations, they can be viewed on a browser here.

Detecting presence of pdcd10b and nrd1a

We take the exons of these genes and apply Smith-Waterman alignment (via the Emboss program) with the scaffolds to them. We order via scaffold starting site.

pdcd10b's 7 exons against eelScaffold32

SFI     ALEN    SCORE   IDEN    IPT     SIM     GAPS    GPT     TSC     TEC     PET     QSC     QEC     QLN     PEQ
5       89      125.5   56      62.9    56      15      16.9    92493   92577   0.0     1       78      79      98.7
6       97      113.0   59      60.8    59      20      20.6    222386  222481  0.0     4       81      83      94.0
7       80      112.0   52      65.0    52      16      20.0    153839  153911  0.0     1       71      82      86.6
2       54      105.0   37      68.5    37      7       13.0    270271  270324  0.0     2       48      54      87.0
1       91      134.0   58      63.7    58      9       9.9     305491  305572  0.0     6       96      96      94.8
4       149     161.5   91      61.1    91      29      19.5    337277  337419  0.0     2       127     127     99.2
3       112     129.5   69      61.6    69      16      14.3    607211  607313  0.0     7       111     118     89.0
Score for 7 query sequences (total 639 bp) against target (679422 bp) = 880.50

pdcd10b's 7 exons against eelScaffold320

SFI     ALEN    SCORE   IDEN    IPT     SIM     GAPS    GPT     TSC     TEC     PET     QSC     QEC     QLN     PEQ
5       51      105.0   36      70.6    36      5       9.8     13517   13567   0.0     27      72      79      58.2
1       110     130.0   70      63.6    70      25      22.7    52225   52323   0.0     1       96      96      100.0
6       89      116.5   55      61.8    55      15      16.9    60604   60688   0.0     6       83      83      94.0
2       50      97.0    36      72.0    36      7       14.0    64406   64450   0.0     1       48      54      88.9
7       84      112.5   55      65.5    55      14      16.7    162512  162589  0.0     7       82      82      92.7
4       124     138.5   79      63.7    79      19      15.3    222772  222891  0.0     18      126     127     85.8
3       79      131.0   54      68.4    54      9       11.4    238985  239059  0.0     24      97      118     62.7
Score for 7 query sequences (total 639 bp) against target (246433 bp) = 830.5

pdcd10b discussion

This is a small gene, however we cannot say these alignments are conclusive. It is clear however that eelScaffold32 has a higher probability of harbouring pdcd10b.

nrd1a's 37 exons against eelScaffold32

SFI     ALEN    SCORE   IDEN    IPT     SIM     GAPS    GPT     TSC     TEC     PET     QSC     QEC     QLN     PEQ
2       270     217.5   160     59.3    160     60      22.2    4926    5175    0.0     34      263     324     71.0
3       235     257.0   143     60.9    143     35      14.9    9693    9920    0.0     5       211     241     85.9
15      88      117.5   54      61.4    54      14      15.9    66759   66843   0.0     2       78      78      98.7
12      34      104.0   28      82.4    28      2       5.9     67614   67647   0.0     24      55      58      55.2
36      73      114.5   47      64.4    47      14      19.2    140070  140139  0.0     3       64      70      88.6
4       37      104.0   28      75.7    28      0       0.0     159420  159456  0.0     2       38      38      97.4
23      18      72.0    16      88.9    16      0       0.0     190407  190424  0.0     5       22      22      81.8
33      108     115.5   66      61.1    66      24      22.2    214642  214747  0.0     5       90      91      94.5
19      108     133.5   69      63.9    69      13      12.0    222360  222458  0.0     1       104     105     99.0
13      84      109.5   52      61.9    52      15      17.9    237021  237101  0.0     3       74      74      97.3
17      60      97.5    41      68.3    41      14      23.3    258142  258195  0.0     2       53      59      88.1
35      98      124.0   60      61.2    60      19      19.4    259372  259461  0.0     1       87      90      96.7
29      126     162.0   79      62.7    79      20      15.9    268052  268173  0.0     6       115     139     79.1
18      46      105.5   34      73.9    34      4       8.7     281107  281152  0.0     1       42      55      76.4
10      107     125.5   69      64.5    69      20      18.7    342133  342236  0.0     6       95      96      93.8
21      94      131.0   60      63.8    60      18      19.1    365037  365120  0.0     1       86      87      98.9
20      136     131.0   82      60.3    82      28      20.6    380436  380559  0.0     2       121     124     96.8
34      127     131.0   78      61.4    78      28      22.0    387971  388090  0.0     1       106     117     90.6
25      140     154.0   86      61.4    86      20      14.3    409140  409267  0.0     9       140     151     87.4
37      125     149.5   78      62.4    78      30      24.0    411212  411315  0.0     15      130     131     88.5
9       77      113.5   50      64.9    50      11      14.3    478120  478193  0.0     4       72      74      93.2
1       609     297.5   343     56.3    343     143     23.5    491670  492216  0.1     4       531     541     97.6
16      120     147.0   72      60.0    72      11      9.2     519285  519397  0.0     2       117     121     95.9
28      35      76.0    25      71.4    25      4       11.4    522407  522440  0.0     1       32      32      100.0
26      124     147.5   76      61.3    76      28      22.6    531064  531176  0.0     21      127     128     83.6
31      55      108.5   40      72.7    40      9       16.4    540137  540190  0.0     3       49      53      88.7
14      73      107.0   50      68.5    50      16      21.9    557391  557458  0.0     3       64      70      88.6
6       23      67.0    18      78.3    18      3       13.0    571200  571222  0.0     3       22      23      87.0
11      144     144.0   86      59.7    86      34      23.6    586719  586853  0.0     3       121     123     96.7
30      50      95.5    35      70.0    35      7       14.0    599465  599511  0.0     3       48      48      95.8
7       54      105.0   38      70.4    38      6       11.1    599975  600025  0.0     30      80      82      62.2
5       21      72.0    18      85.7    18      1       4.8     604115  604135  0.0     5       24      26      76.9
27      173     176.5   105     60.7    105     39      22.5    621983  622127  0.0     2       163     163     99.4
24      46      98.0    34      73.9    34      4       8.7     655376  655420  0.0     6       48      52      82.7
22      94      119.0   60      63.8    60      20      21.3    668713  668800  0.0     1       80      101     79.2
8       141     145.5   88      62.4    88      30      21.3    669021  669148  0.0     28      151     154     80.5
32      69      120.0   47      68.1    47      7       10.1    670437  670500  0.0     9       75      84      79.8
Score for 37 query sequences (total 4025 bp) against target (679422 bp) = 4795.50

nrd1a's 7 exons against eelScaffold320

SFI     ALEN    SCORE   IDEN    IPT     SIM     GAPS    GPT     TSC     TEC     PET     QSC     QEC     QLN     PEQ
32      52      99.5    36      69.2    36      4       7.7     11214   11261   0.0     10      61      84      61.9
18      46      105.5   34      73.9    34      4       8.7     21678   21720   0.0     1       45      55      81.8
17      70      95.0    45      64.3    45      20      28.6    22625   22693   0.0     3       53      59      86.4
24      40      96.5    29      72.5    29      2       5.0     22849   22886   0.0     8       47      52      76.9
23      21      69.0    17      81.0    17      0       0.0     26374   26394   0.0     1       21      22      95.5
6       18      72.0    16      88.9    16      0       0.0     28762   28779   0.0     5       22      23      78.3
14      64      105.5   42      65.6    42      9       14.1    42560   42621   0.0     12      68      70      81.4
2       315     228.0   183     58.1    183     64      20.3    46282   46552   0.1     7       301     324     91.0
21      90      112.5   58      64.4    58      17      18.9    80763   80842   0.0     3       85      87      95.4
25      153     138.0   88      57.5    88      27      17.6    81195   81324   0.1     1       149     151     98.7
11      81      148.5   55      67.9    55      10      12.3    81550   81625   0.0     47      122     123     61.8
1       525     261.0   294     56.0    294     105     20.0    83281   83764   0.2     2       462     541     85.2
15      61      107.0   40      65.6    40      4       6.6     89643   89699   0.0     8       68      78      78.2
28      39      78.0    28      71.8    28      8       20.5    92398   92435   0.0     1       32      32      100.0
8       168     147.0   104     61.9    104     32      19.0    97584   97741   0.1     8       153     154     94.8
3       254     229.0   150     59.1    150     55      21.7    106270  106501  0.1     6       226     241     91.7
31      54      88.5    37      68.5    37      6       11.1    106293  106342  0.0     2       53      53      98.1
29      129     142.5   79      61.2    79      30      23.3    107213  107332  0.0     23      130     139     77.7
13      70      119.0   46      65.7    46      5       7.1     108538  108605  0.0     1       67      74      90.5
4       37      84.5    27      73.0    27      3       8.1     129499  129534  0.0     3       37      38      92.1
37      131     128.5   79      60.3    79      26      19.8    133637  133753  0.0     13      131     131     90.8
35      96      114.5   60      62.5    60      21      21.9    135814  135897  0.0     1       87      90      96.7
9       78      106.5   50      64.1    50      14      17.9    136915  136989  0.0     5       71      74      90.5
30      39      87.0    27      69.2    27      0       0.0     137258  137296  0.0     1       39      48      81.2
22      114     120.0   69      60.5    69      20      17.5    143789  143896  0.0     2       101     101     99.0
5       30      64.5    22      73.3    22      6       20.0    158927  158955  0.0     1       25      26      96.2
27      176     145.0   102     58.0    102     39      22.2    161163  161325  0.1     13      162     163     92.0
33      89      109.0   56      62.9    56      19      21.3    164489  164570  0.0     5       81      91      84.6
10      82      119.0   52      63.4    52      11      13.4    168702  168776  0.0     1       78      96      81.2
7       81      106.5   50      61.7    50      7       8.6     178661  178740  0.0     4       78      82      91.5
20      121     141.5   76      62.8    76      21      17.4    178727  178838  0.0     14      122     124     87.9
12      43      98.0    31      72.1    31      4       9.3     202832  202872  0.0     5       45      58      70.7
36      65      118.0   44      67.7    44      8       12.3    208219  208279  0.0     3       63      70      87.1
19      120     157.5   76      63.3    76      29      24.2    221343  221456  0.0     2       98      105     92.4
34      86      124.0   58      67.4    58      14      16.3    221411  221484  0.0     19      102     117     71.8
26      123     127.5   74      60.2    74      14      11.4    225229  225348  0.0     4       115     128     87.5
16      129     126.0   77      59.7    77      21      16.3    226946  227071  0.1     11      121     121     91.7
Score for 37 query sequences (total 4025 bp) against target (246433 bp) = 4519.50

nrd1a discussion

Also an inconclusive analysis. EelScaffold32 again is more likely to be harbouring this second gene. Being some three time longer, it has a higher prior likelihood for this.