Two Eel Scaffolds

From wiki
Revision as of 16:13, 15 May 2016 by Rf (talk | contribs) (pdcd10b's 7 exons against eelScaffold32)
Jump to: navigation, search

Introduction

Two DNA scaffolds are presented:

  1. eelScaffold32. 679 422 bp and 42.25% GC.
  2. eelScaffold320. 246 433 bp and 43.47% GC.

We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.

There are two genes expected to be around about the regions covered by these scaffolds:

  • eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
  • eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)

Gene Predictor

One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.

An example Augustus command line is as follows:

augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf

Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:

gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff

We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:

getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf

In order to visually navigate the results of these annotations, they can be viewed on a browser here.

Detecting presence of pdcd10b and nrd1a

We obtain these genes from the tilapia and then their exons and apply Smith-Waterman alignment (via the Emboss program, wrapped in this script with the scaffolds to them. We order via scaffold starting site (reverse strand end site).

pdcd10b's 7 exons against eelScaffold32

Output of script:

5       51      105.0   36      70.6    36      5       9.8     13517   13567   0.0     27      72      79      58.2
1       110     130.0   70      63.6    70      25      22.7    52225   52323   0.0     1       96      96      100.0
6       89      116.5   55      61.8    55      15      16.9    60604   60688   0.0     6       83      83      94.0
2       50      97.0    36      72.0    36      7       14.0    64406   64450   0.0     1       48      54      88.9
7       84      112.5   55      65.5    55      14      16.7    162512  162589  0.0     7       82      82      92.7
4       124     138.5   79      63.7    79      19      15.3    222772  222891  0.0     18      126     127     85.8
3       79      131.0   54      68.4    54      9       11.4    238985  239059  0.0     24      97      118     62.7
Score for 7 query sequences (total 639 bp) against forward-sense target (246433 bp) = 830.50
Exon separation string:
<< e00:13517-13567 >> 38658 << e01:52225-52323 >> 8281 << e02:60604-60688 >> 3718 << e03:64406-64450 >> 98062 << e04:162512-162589 >> 60183 << e05:222772-222891 >> 16094 << e06:238985-239059 >>
2       50      86.5    34      68.0    34      7       14.0    47771   47813   0.0     5       54      54      92.6
1       98      121.0   60      61.2    60      17      17.3    89576   89663   0.0     4       94      96      94.8
6       82      122.0   52      63.4    52      18      22.0    107021  107102  0.0     9       72      83      77.1
4       99      147.0   66      66.7    66      16      16.2    136008  136092  0.0     31      127     127     76.4
3       105     129.0   65      61.9    65      16      15.2    152393  152490  0.0     14      109     118     81.4
5       71      121.0   46      64.8    46      9       12.7    182775  182843  0.0     16      79      79      81.0
7       75      109.5   49      65.3    49      14      18.7    217912  217977  0.0     3       72      82      85.4
Score for 7 query sequences (total 639 bp) against reverse-sense target (246433 bp) = 836.00
Key: SFI src file idx, ALEN aln length, SCORE aln score, IDEN identical bases, IPT percent iden, SIM similar bases, GAPS num gaps, GPT gap percent
        TSC target start query, TEC target end coord, PET percent of target, QSC Query start coord, QEC query end coord, QLN query aln length, PEQ percent of query
Exon separation string:
<< e00:47771-47813 >> 41763 << e01:89576-89663 >> 17358 << e02:107021-107102 >> 28906 << e03:136008-136092 >> 16301 << e04:152393-152490 >> 30285 << e05:182775-182843 >> 35069 << e06:217912-217977 >>

We can clearly see good alignment on the reverse strand, and so can verify pdcd10b presence in eelScaffold32.

nrd1a's 37 exons against eelScaffold320

SFI     ALEN    SCORE   IDEN    IPT     SIM     GAPS    GPT     TSC     TEC     PET     QSC     QEC     QLN     PEQ
32      52      99.5    36      69.2    36      4       7.7     11214   11261   0.0     10      61      84      61.9
18      46      105.5   34      73.9    34      4       8.7     21678   21720   0.0     1       45      55      81.8
17      70      95.0    45      64.3    45      20      28.6    22625   22693   0.0     3       53      59      86.4
24      40      96.5    29      72.5    29      2       5.0     22849   22886   0.0     8       47      52      76.9
23      21      69.0    17      81.0    17      0       0.0     26374   26394   0.0     1       21      22      95.5
6       18      72.0    16      88.9    16      0       0.0     28762   28779   0.0     5       22      23      78.3
14      64      105.5   42      65.6    42      9       14.1    42560   42621   0.0     12      68      70      81.4
2       315     228.0   183     58.1    183     64      20.3    46282   46552   0.1     7       301     324     91.0
21      90      112.5   58      64.4    58      17      18.9    80763   80842   0.0     3       85      87      95.4
25      153     138.0   88      57.5    88      27      17.6    81195   81324   0.1     1       149     151     98.7
11      81      148.5   55      67.9    55      10      12.3    81550   81625   0.0     47      122     123     61.8
1       525     261.0   294     56.0    294     105     20.0    83281   83764   0.2     2       462     541     85.2
15      61      107.0   40      65.6    40      4       6.6     89643   89699   0.0     8       68      78      78.2
28      39      78.0    28      71.8    28      8       20.5    92398   92435   0.0     1       32      32      100.0
8       168     147.0   104     61.9    104     32      19.0    97584   97741   0.1     8       153     154     94.8
3       254     229.0   150     59.1    150     55      21.7    106270  106501  0.1     6       226     241     91.7
31      54      88.5    37      68.5    37      6       11.1    106293  106342  0.0     2       53      53      98.1
29      129     142.5   79      61.2    79      30      23.3    107213  107332  0.0     23      130     139     77.7
13      70      119.0   46      65.7    46      5       7.1     108538  108605  0.0     1       67      74      90.5
4       37      84.5    27      73.0    27      3       8.1     129499  129534  0.0     3       37      38      92.1
37      131     128.5   79      60.3    79      26      19.8    133637  133753  0.0     13      131     131     90.8
35      96      114.5   60      62.5    60      21      21.9    135814  135897  0.0     1       87      90      96.7
9       78      106.5   50      64.1    50      14      17.9    136915  136989  0.0     5       71      74      90.5
30      39      87.0    27      69.2    27      0       0.0     137258  137296  0.0     1       39      48      81.2
22      114     120.0   69      60.5    69      20      17.5    143789  143896  0.0     2       101     101     99.0
5       30      64.5    22      73.3    22      6       20.0    158927  158955  0.0     1       25      26      96.2
27      176     145.0   102     58.0    102     39      22.2    161163  161325  0.1     13      162     163     92.0
33      89      109.0   56      62.9    56      19      21.3    164489  164570  0.0     5       81      91      84.6
10      82      119.0   52      63.4    52      11      13.4    168702  168776  0.0     1       78      96      81.2
7       81      106.5   50      61.7    50      7       8.6     178661  178740  0.0     4       78      82      91.5
20      121     141.5   76      62.8    76      21      17.4    178727  178838  0.0     14      122     124     87.9
12      43      98.0    31      72.1    31      4       9.3     202832  202872  0.0     5       45      58      70.7
36      65      118.0   44      67.7    44      8       12.3    208219  208279  0.0     3       63      70      87.1
19      120     157.5   76      63.3    76      29      24.2    221343  221456  0.0     2       98      105     92.4
34      86      124.0   58      67.4    58      14      16.3    221411  221484  0.0     19      102     117     71.8
26      123     127.5   74      60.2    74      14      11.4    225229  225348  0.0     4       115     128     87.5
16      129     126.0   77      59.7    77      21      16.3    226946  227071  0.1     11      121     121     91.7
Score for 37 query sequences (total 4025 bp) against forward-sense target (246433 bp) = 4519.50
SFI     ALEN    SCORE   IDEN    IPT     SIM     GAPS    GPT     TSC     TEC     PET     QSC     QEC     QLN     PEQ
36      79      135.5   53      67.1    53      13      16.5    31868   31943   0.0     2       70      70      98.6
1       531     279.0   295     55.6    295     122     23.0    44814   45314   0.2     52      490     541     81.1
37      130     317.0   93      71.5    93      0       0.0     88877   89006   0.1     1       130     131     99.2
35      91      213.5   66      72.5    66      4       4.4     89201   89289   0.0     1       89      90      98.9
34      117     360.0   92      78.6    92      0       0.0     89454   89570   0.0     1       117     117     100.0
33      93      249.0   70      75.3    70      4       4.3     89571   89661   0.0     1       91      91      100.0
32      82      275.0   67      81.7    67      0       0.0     90168   90249   0.0     2       83      84      97.6
31      53      211.0   47      88.7    47      0       0.0     90408   90460   0.0     1       53      53      100.0
30      48      123.0   35      72.9    35      0       0.0     91324   91371   0.0     1       48      48      100.0
29      140     422.5   111     79.3    111     4       2.9     91967   92104   0.1     2       139     139     99.3
28      32      97.0    25      78.1    25      0       0.0     92238   92269   0.0     1       32      32      100.0
26      118     365.0   93      78.8    93      0       0.0     93551   93668   0.0     11      128     128     92.2
25      150     408.0   113     75.3    113     4       2.7     93879   94026   0.1     3       150     151     98.0
24      52      107.0   35      67.3    35      0       0.0     94256   94307   0.0     1       52      52      100.0
22      101     253.0   73      72.3    73      0       0.0     94486   94586   0.0     1       101     101     100.0
21      89      185.5   62      69.7    62      4       4.5     94724   94810   0.0     1       87      87      100.0
20      124     359.0   95      76.6    95      0       0.0     94902   95025   0.1     1       124     124     100.0
19      101     289.0   77      76.2    77      0       0.0     95201   95301   0.0     4       104     105     96.2
18      53      202.0   46      86.8    46      0       0.0     96018   96070   0.0     2       54      55      96.4
17      59      214.0   50      84.7    50      0       0.0     96349   96407   0.0     1       59      59      100.0
16      123     426.0   103     83.7    103     4       3.3     96864   96984   0.0     1       121     121     100.0
15      77      286.0   66      85.7    66      0       0.0     97195   97271   0.0     1       77      78      98.7
14      67      200.0   52      77.6    52      0       0.0     97632   97698   0.0     4       70      70      95.7
13      74      181.0   53      71.6    53      0       0.0     98153   98226   0.0     1       74      74      100.0
12      58      227.0   51      87.9    51      0       0.0     98324   98381   0.0     1       58      58      100.0
11      124     284.0   88      71.0    88      2       1.6     98645   98767   0.0     1       123     123     100.0
10      96      336.0   80      83.3    80      0       0.0     99073   99168   0.0     1       96      96      100.0
9       73      239.0   59      80.8    59      0       0.0     99304   99376   0.0     2       74      74      98.6
8       155     583.0   135     87.1    135     2       1.3     99622   99775   0.1     1       154     154     100.0
7       83      205.0   61      73.5    61      2       2.4     99952   100033  0.0     1       82      82      100.0
3       255     397.5   169     66.3    169     31      12.2    100706  100943  0.1     1       241     241     100.0
2       331     330.5   196     59.2    196     57      17.2    101708  101996  0.1     9       324     324     97.5
6       21      63.0    17      81.0    17      1       4.8     113247  113266  0.0     2       22      23      91.3
5       25      71.0    21      84.0    21      3       12.0    141659  141681  0.0     2       25      26      92.3
27      171     165.0   104     60.8    104     37      21.6    142741  142906  0.1     18      156     163     85.3
23      22      78.5    19      86.4    19      2       9.1     213159  213180  0.0     2       21      22      90.9
4       46      92.0    31      67.4    31      8       17.4    228468  228513  0.0     1       38      38      100.0
Score for 37 query sequences (total 4025 bp) against reverse-sense target (246433 bp) = 9229.50
Key: SFI src file idx, ALEN aln length, SCORE aln score, IDEN identical bases, IPT percent iden, SIM similar bases, GAPS num gaps, GPT gap percent
       TSC target start query, TEC target end coord, PET percent of target, QSC Query start coord, QEC query end coord, QLN query aln length, PEQ percent of query

This is a more complicated gene, so the alignment is less good, but there is clearly good identity so we can reasonably suspect the reverse strand harbours this second gene.