Difference between revisions of "Two Eel Scaffolds"
 (→pdcd10b's 7 exons against eelScaffold320)  | 
				|||
| Line 61: | Line 61: | ||
== pdcd10b discussion ==  | == pdcd10b discussion ==  | ||
| − | This is a small gene, however these alignments are   | + | This is a small gene, however we cannot say these alignments are conclusive. It is clear however that eelScaffold32 has a higher probability of harbouring pdcd10b.  | 
| − | |||
== nrd1a's 37 exons against eelScaffold32 ==  | == nrd1a's 37 exons against eelScaffold32 ==  | ||
| Line 104: | Line 103: | ||
  8       141     145.5   88      62.4    88      30      21.3    669021  669148  0.0     28      151     154     80.5  |   8       141     145.5   88      62.4    88      30      21.3    669021  669148  0.0     28      151     154     80.5  | ||
  32      69      120.0   47      68.1    47      7       10.1    670437  670500  0.0     9       75      84      79.8  |   32      69      120.0   47      68.1    47      7       10.1    670437  670500  0.0     9       75      84      79.8  | ||
| − | + |   Score for 37 query sequences (total 4025 bp) against target (679422 bp) = 4795.50  | |
| + | |||
| + | == nrd1a's 7 exons against eelScaffold320 ==  | ||
| + |  SFI     ALEN    SCORE   IDEN    IPT     SIM     GAPS    GPT     TSC     TEC     PET     QSC     QEC     QLN     PEQ  | ||
| + |  32      52      99.5    36      69.2    36      4       7.7     11214   11261   0.0     10      61      84      61.9  | ||
| + |  18      46      105.5   34      73.9    34      4       8.7     21678   21720   0.0     1       45      55      81.8  | ||
| + |  17      70      95.0    45      64.3    45      20      28.6    22625   22693   0.0     3       53      59      86.4  | ||
| + |  24      40      96.5    29      72.5    29      2       5.0     22849   22886   0.0     8       47      52      76.9  | ||
| + |  23      21      69.0    17      81.0    17      0       0.0     26374   26394   0.0     1       21      22      95.5  | ||
| + |  6       18      72.0    16      88.9    16      0       0.0     28762   28779   0.0     5       22      23      78.3  | ||
| + |  14      64      105.5   42      65.6    42      9       14.1    42560   42621   0.0     12      68      70      81.4  | ||
| + |  2       315     228.0   183     58.1    183     64      20.3    46282   46552   0.1     7       301     324     91.0  | ||
| + |  21      90      112.5   58      64.4    58      17      18.9    80763   80842   0.0     3       85      87      95.4  | ||
| + |  25      153     138.0   88      57.5    88      27      17.6    81195   81324   0.1     1       149     151     98.7  | ||
| + |  11      81      148.5   55      67.9    55      10      12.3    81550   81625   0.0     47      122     123     61.8  | ||
| + |  1       525     261.0   294     56.0    294     105     20.0    83281   83764   0.2     2       462     541     85.2  | ||
| + |  15      61      107.0   40      65.6    40      4       6.6     89643   89699   0.0     8       68      78      78.2  | ||
| + |  28      39      78.0    28      71.8    28      8       20.5    92398   92435   0.0     1       32      32      100.0  | ||
| + |  8       168     147.0   104     61.9    104     32      19.0    97584   97741   0.1     8       153     154     94.8  | ||
| + |  3       254     229.0   150     59.1    150     55      21.7    106270  106501  0.1     6       226     241     91.7  | ||
| + |  31      54      88.5    37      68.5    37      6       11.1    106293  106342  0.0     2       53      53      98.1  | ||
| + |  29      129     142.5   79      61.2    79      30      23.3    107213  107332  0.0     23      130     139     77.7  | ||
| + |  13      70      119.0   46      65.7    46      5       7.1     108538  108605  0.0     1       67      74      90.5  | ||
| + |  4       37      84.5    27      73.0    27      3       8.1     129499  129534  0.0     3       37      38      92.1  | ||
| + |  37      131     128.5   79      60.3    79      26      19.8    133637  133753  0.0     13      131     131     90.8  | ||
| + |  35      96      114.5   60      62.5    60      21      21.9    135814  135897  0.0     1       87      90      96.7  | ||
| + |  9       78      106.5   50      64.1    50      14      17.9    136915  136989  0.0     5       71      74      90.5  | ||
| + |  30      39      87.0    27      69.2    27      0       0.0     137258  137296  0.0     1       39      48      81.2  | ||
| + |  22      114     120.0   69      60.5    69      20      17.5    143789  143896  0.0     2       101     101     99.0  | ||
| + |  5       30      64.5    22      73.3    22      6       20.0    158927  158955  0.0     1       25      26      96.2  | ||
| + |  27      176     145.0   102     58.0    102     39      22.2    161163  161325  0.1     13      162     163     92.0  | ||
| + |  33      89      109.0   56      62.9    56      19      21.3    164489  164570  0.0     5       81      91      84.6  | ||
| + |  10      82      119.0   52      63.4    52      11      13.4    168702  168776  0.0     1       78      96      81.2  | ||
| + |  7       81      106.5   50      61.7    50      7       8.6     178661  178740  0.0     4       78      82      91.5  | ||
| + |  20      121     141.5   76      62.8    76      21      17.4    178727  178838  0.0     14      122     124     87.9  | ||
| + |  12      43      98.0    31      72.1    31      4       9.3     202832  202872  0.0     5       45      58      70.7  | ||
| + |  36      65      118.0   44      67.7    44      8       12.3    208219  208279  0.0     3       63      70      87.1  | ||
| + |  19      120     157.5   76      63.3    76      29      24.2    221343  221456  0.0     2       98      105     92.4  | ||
| + |  34      86      124.0   58      67.4    58      14      16.3    221411  221484  0.0     19      102     117     71.8  | ||
| + |  26      123     127.5   74      60.2    74      14      11.4    225229  225348  0.0     4       115     128     87.5  | ||
| + |  16      129     126.0   77      59.7    77      21      16.3    226946  227071  0.1     11      121     121     91.7  | ||
| + |  Score for 37 query sequences (total 4025 bp) against target (246433 bp) = 4519.50  | ||
| + | |||
| + | = nrd1a discussion =  | ||
| − | + | Also an inconclusive analysis. EelScaffold32 again is more likely to be harbouring this second gene. Being some three time longer, it has a higher prior likelihood for this.  | |
Revision as of 15:02, 13 May 2016
Contents
Introduction
Two DNA scaffolds are presented:
- eelScaffold32. 679 422 bp and 42.25% GC.
 - eelScaffold320. 246 433 bp and 43.47% GC.
 
We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.
There are two genes expected to be around about the regions covered by these scaffolds:
- eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
 - eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)
 
ORF Analysis
ORF scans for sequences over 100 kbp often throw up too much data, but it can be useful first step to see the complexity of the sequence.
Gene Predictor
One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.
An example Augustus command line is as follows:
augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf
Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:
gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff
We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:
getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf
Detecting presence of pdcd10b and nrd1a
We take the exons of these genes and apply Waterstone alignment (via the Emboss program) with the scaffolds to them. We order via scaffold starting site.
pdcd10b's 7 exons against eelScaffold32
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 5 89 125.5 56 62.9 56 15 16.9 92493 92577 0.0 1 78 79 98.7 6 97 113.0 59 60.8 59 20 20.6 222386 222481 0.0 4 81 83 94.0 7 80 112.0 52 65.0 52 16 20.0 153839 153911 0.0 1 71 82 86.6 2 54 105.0 37 68.5 37 7 13.0 270271 270324 0.0 2 48 54 87.0 1 91 134.0 58 63.7 58 9 9.9 305491 305572 0.0 6 96 96 94.8 4 149 161.5 91 61.1 91 29 19.5 337277 337419 0.0 2 127 127 99.2 3 112 129.5 69 61.6 69 16 14.3 607211 607313 0.0 7 111 118 89.0 Total score for all alignments = 880.5
pdcd10b's 7 exons against eelScaffold320
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 5 51 105.0 36 70.6 36 5 9.8 13517 13567 0.0 27 72 79 58.2 1 110 130.0 70 63.6 70 25 22.7 52225 52323 0.0 1 96 96 100.0 6 89 116.5 55 61.8 55 15 16.9 60604 60688 0.0 6 83 83 94.0 2 50 97.0 36 72.0 36 7 14.0 64406 64450 0.0 1 48 54 88.9 7 84 112.5 55 65.5 55 14 16.7 162512 162589 0.0 7 82 82 92.7 4 124 138.5 79 63.7 79 19 15.3 222772 222891 0.0 18 126 127 85.8 3 79 131.0 54 68.4 54 9 11.4 238985 239059 0.0 24 97 118 62.7 Total score for all alignments = 830.5
pdcd10b discussion
This is a small gene, however we cannot say these alignments are conclusive. It is clear however that eelScaffold32 has a higher probability of harbouring pdcd10b.
nrd1a's 37 exons against eelScaffold32
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 2 270 217.5 160 59.3 160 60 22.2 4926 5175 0.0 34 263 324 71.0 3 235 257.0 143 60.9 143 35 14.9 9693 9920 0.0 5 211 241 85.9 15 88 117.5 54 61.4 54 14 15.9 66759 66843 0.0 2 78 78 98.7 12 34 104.0 28 82.4 28 2 5.9 67614 67647 0.0 24 55 58 55.2 36 73 114.5 47 64.4 47 14 19.2 140070 140139 0.0 3 64 70 88.6 4 37 104.0 28 75.7 28 0 0.0 159420 159456 0.0 2 38 38 97.4 23 18 72.0 16 88.9 16 0 0.0 190407 190424 0.0 5 22 22 81.8 33 108 115.5 66 61.1 66 24 22.2 214642 214747 0.0 5 90 91 94.5 19 108 133.5 69 63.9 69 13 12.0 222360 222458 0.0 1 104 105 99.0 13 84 109.5 52 61.9 52 15 17.9 237021 237101 0.0 3 74 74 97.3 17 60 97.5 41 68.3 41 14 23.3 258142 258195 0.0 2 53 59 88.1 35 98 124.0 60 61.2 60 19 19.4 259372 259461 0.0 1 87 90 96.7 29 126 162.0 79 62.7 79 20 15.9 268052 268173 0.0 6 115 139 79.1 18 46 105.5 34 73.9 34 4 8.7 281107 281152 0.0 1 42 55 76.4 10 107 125.5 69 64.5 69 20 18.7 342133 342236 0.0 6 95 96 93.8 21 94 131.0 60 63.8 60 18 19.1 365037 365120 0.0 1 86 87 98.9 20 136 131.0 82 60.3 82 28 20.6 380436 380559 0.0 2 121 124 96.8 34 127 131.0 78 61.4 78 28 22.0 387971 388090 0.0 1 106 117 90.6 25 140 154.0 86 61.4 86 20 14.3 409140 409267 0.0 9 140 151 87.4 37 125 149.5 78 62.4 78 30 24.0 411212 411315 0.0 15 130 131 88.5 9 77 113.5 50 64.9 50 11 14.3 478120 478193 0.0 4 72 74 93.2 1 609 297.5 343 56.3 343 143 23.5 491670 492216 0.1 4 531 541 97.6 16 120 147.0 72 60.0 72 11 9.2 519285 519397 0.0 2 117 121 95.9 28 35 76.0 25 71.4 25 4 11.4 522407 522440 0.0 1 32 32 100.0 26 124 147.5 76 61.3 76 28 22.6 531064 531176 0.0 21 127 128 83.6 31 55 108.5 40 72.7 40 9 16.4 540137 540190 0.0 3 49 53 88.7 14 73 107.0 50 68.5 50 16 21.9 557391 557458 0.0 3 64 70 88.6 6 23 67.0 18 78.3 18 3 13.0 571200 571222 0.0 3 22 23 87.0 11 144 144.0 86 59.7 86 34 23.6 586719 586853 0.0 3 121 123 96.7 30 50 95.5 35 70.0 35 7 14.0 599465 599511 0.0 3 48 48 95.8 7 54 105.0 38 70.4 38 6 11.1 599975 600025 0.0 30 80 82 62.2 5 21 72.0 18 85.7 18 1 4.8 604115 604135 0.0 5 24 26 76.9 27 173 176.5 105 60.7 105 39 22.5 621983 622127 0.0 2 163 163 99.4 24 46 98.0 34 73.9 34 4 8.7 655376 655420 0.0 6 48 52 82.7 22 94 119.0 60 63.8 60 20 21.3 668713 668800 0.0 1 80 101 79.2 8 141 145.5 88 62.4 88 30 21.3 669021 669148 0.0 28 151 154 80.5 32 69 120.0 47 68.1 47 7 10.1 670437 670500 0.0 9 75 84 79.8 Score for 37 query sequences (total 4025 bp) against target (679422 bp) = 4795.50
nrd1a's 7 exons against eelScaffold320
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 32 52 99.5 36 69.2 36 4 7.7 11214 11261 0.0 10 61 84 61.9 18 46 105.5 34 73.9 34 4 8.7 21678 21720 0.0 1 45 55 81.8 17 70 95.0 45 64.3 45 20 28.6 22625 22693 0.0 3 53 59 86.4 24 40 96.5 29 72.5 29 2 5.0 22849 22886 0.0 8 47 52 76.9 23 21 69.0 17 81.0 17 0 0.0 26374 26394 0.0 1 21 22 95.5 6 18 72.0 16 88.9 16 0 0.0 28762 28779 0.0 5 22 23 78.3 14 64 105.5 42 65.6 42 9 14.1 42560 42621 0.0 12 68 70 81.4 2 315 228.0 183 58.1 183 64 20.3 46282 46552 0.1 7 301 324 91.0 21 90 112.5 58 64.4 58 17 18.9 80763 80842 0.0 3 85 87 95.4 25 153 138.0 88 57.5 88 27 17.6 81195 81324 0.1 1 149 151 98.7 11 81 148.5 55 67.9 55 10 12.3 81550 81625 0.0 47 122 123 61.8 1 525 261.0 294 56.0 294 105 20.0 83281 83764 0.2 2 462 541 85.2 15 61 107.0 40 65.6 40 4 6.6 89643 89699 0.0 8 68 78 78.2 28 39 78.0 28 71.8 28 8 20.5 92398 92435 0.0 1 32 32 100.0 8 168 147.0 104 61.9 104 32 19.0 97584 97741 0.1 8 153 154 94.8 3 254 229.0 150 59.1 150 55 21.7 106270 106501 0.1 6 226 241 91.7 31 54 88.5 37 68.5 37 6 11.1 106293 106342 0.0 2 53 53 98.1 29 129 142.5 79 61.2 79 30 23.3 107213 107332 0.0 23 130 139 77.7 13 70 119.0 46 65.7 46 5 7.1 108538 108605 0.0 1 67 74 90.5 4 37 84.5 27 73.0 27 3 8.1 129499 129534 0.0 3 37 38 92.1 37 131 128.5 79 60.3 79 26 19.8 133637 133753 0.0 13 131 131 90.8 35 96 114.5 60 62.5 60 21 21.9 135814 135897 0.0 1 87 90 96.7 9 78 106.5 50 64.1 50 14 17.9 136915 136989 0.0 5 71 74 90.5 30 39 87.0 27 69.2 27 0 0.0 137258 137296 0.0 1 39 48 81.2 22 114 120.0 69 60.5 69 20 17.5 143789 143896 0.0 2 101 101 99.0 5 30 64.5 22 73.3 22 6 20.0 158927 158955 0.0 1 25 26 96.2 27 176 145.0 102 58.0 102 39 22.2 161163 161325 0.1 13 162 163 92.0 33 89 109.0 56 62.9 56 19 21.3 164489 164570 0.0 5 81 91 84.6 10 82 119.0 52 63.4 52 11 13.4 168702 168776 0.0 1 78 96 81.2 7 81 106.5 50 61.7 50 7 8.6 178661 178740 0.0 4 78 82 91.5 20 121 141.5 76 62.8 76 21 17.4 178727 178838 0.0 14 122 124 87.9 12 43 98.0 31 72.1 31 4 9.3 202832 202872 0.0 5 45 58 70.7 36 65 118.0 44 67.7 44 8 12.3 208219 208279 0.0 3 63 70 87.1 19 120 157.5 76 63.3 76 29 24.2 221343 221456 0.0 2 98 105 92.4 34 86 124.0 58 67.4 58 14 16.3 221411 221484 0.0 19 102 117 71.8 26 123 127.5 74 60.2 74 14 11.4 225229 225348 0.0 4 115 128 87.5 16 129 126.0 77 59.7 77 21 16.3 226946 227071 0.1 11 121 121 91.7 Score for 37 query sequences (total 4025 bp) against target (246433 bp) = 4519.50
nrd1a discussion
Also an inconclusive analysis. EelScaffold32 again is more likely to be harbouring this second gene. Being some three time longer, it has a higher prior likelihood for this.