Two Eel Scaffolds
Introduction
Two DNA scaffolds are presented:
- eelScaffold32. 679 422 bp and 42.25% GC.
- eelScaffold320. 246 433 bp and 43.47% GC.
We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.
There are two genes expected to be around about the regions covered by these scaffolds:
- eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
- eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)
ORF Analysis
ORF scans for sequences over 100 kbp often throw up too much data, but it can be useful first step to see the complexity of the sequence.
Gene Predictor
One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.
An example Augustus command line is as follows:
augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf
Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:
gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff
We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:
getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf
In order to visually navigate the results of these annotations, they can be viewed on a browser here.
Detecting presence of pdcd10b and nrd1a
We take the exons of these genes and apply Smith-Waterman alignment (via the Emboss program) with the scaffolds to them. We order via scaffold starting site.
pdcd10b's 7 exons against eelScaffold32
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 5 89 125.5 56 62.9 56 15 16.9 92493 92577 0.0 1 78 79 98.7 6 97 113.0 59 60.8 59 20 20.6 222386 222481 0.0 4 81 83 94.0 7 80 112.0 52 65.0 52 16 20.0 153839 153911 0.0 1 71 82 86.6 2 54 105.0 37 68.5 37 7 13.0 270271 270324 0.0 2 48 54 87.0 1 91 134.0 58 63.7 58 9 9.9 305491 305572 0.0 6 96 96 94.8 4 149 161.5 91 61.1 91 29 19.5 337277 337419 0.0 2 127 127 99.2 3 112 129.5 69 61.6 69 16 14.3 607211 607313 0.0 7 111 118 89.0 Score for 7 query sequences (total 639 bp) against target (679422 bp) = 880.50
pdcd10b's 7 exons against eelScaffold320
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 5 51 105.0 36 70.6 36 5 9.8 13517 13567 0.0 27 72 79 58.2 1 110 130.0 70 63.6 70 25 22.7 52225 52323 0.0 1 96 96 100.0 6 89 116.5 55 61.8 55 15 16.9 60604 60688 0.0 6 83 83 94.0 2 50 97.0 36 72.0 36 7 14.0 64406 64450 0.0 1 48 54 88.9 7 84 112.5 55 65.5 55 14 16.7 162512 162589 0.0 7 82 82 92.7 4 124 138.5 79 63.7 79 19 15.3 222772 222891 0.0 18 126 127 85.8 3 79 131.0 54 68.4 54 9 11.4 238985 239059 0.0 24 97 118 62.7 Score for 7 query sequences (total 639 bp) against target (246433 bp) = 830.5
pdcd10b discussion
This is a small gene, however we cannot say these alignments are conclusive. It is clear however that eelScaffold32 has a higher probability of harbouring pdcd10b.
nrd1a's 37 exons against eelScaffold32
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 2 270 217.5 160 59.3 160 60 22.2 4926 5175 0.0 34 263 324 71.0 3 235 257.0 143 60.9 143 35 14.9 9693 9920 0.0 5 211 241 85.9 15 88 117.5 54 61.4 54 14 15.9 66759 66843 0.0 2 78 78 98.7 12 34 104.0 28 82.4 28 2 5.9 67614 67647 0.0 24 55 58 55.2 36 73 114.5 47 64.4 47 14 19.2 140070 140139 0.0 3 64 70 88.6 4 37 104.0 28 75.7 28 0 0.0 159420 159456 0.0 2 38 38 97.4 23 18 72.0 16 88.9 16 0 0.0 190407 190424 0.0 5 22 22 81.8 33 108 115.5 66 61.1 66 24 22.2 214642 214747 0.0 5 90 91 94.5 19 108 133.5 69 63.9 69 13 12.0 222360 222458 0.0 1 104 105 99.0 13 84 109.5 52 61.9 52 15 17.9 237021 237101 0.0 3 74 74 97.3 17 60 97.5 41 68.3 41 14 23.3 258142 258195 0.0 2 53 59 88.1 35 98 124.0 60 61.2 60 19 19.4 259372 259461 0.0 1 87 90 96.7 29 126 162.0 79 62.7 79 20 15.9 268052 268173 0.0 6 115 139 79.1 18 46 105.5 34 73.9 34 4 8.7 281107 281152 0.0 1 42 55 76.4 10 107 125.5 69 64.5 69 20 18.7 342133 342236 0.0 6 95 96 93.8 21 94 131.0 60 63.8 60 18 19.1 365037 365120 0.0 1 86 87 98.9 20 136 131.0 82 60.3 82 28 20.6 380436 380559 0.0 2 121 124 96.8 34 127 131.0 78 61.4 78 28 22.0 387971 388090 0.0 1 106 117 90.6 25 140 154.0 86 61.4 86 20 14.3 409140 409267 0.0 9 140 151 87.4 37 125 149.5 78 62.4 78 30 24.0 411212 411315 0.0 15 130 131 88.5 9 77 113.5 50 64.9 50 11 14.3 478120 478193 0.0 4 72 74 93.2 1 609 297.5 343 56.3 343 143 23.5 491670 492216 0.1 4 531 541 97.6 16 120 147.0 72 60.0 72 11 9.2 519285 519397 0.0 2 117 121 95.9 28 35 76.0 25 71.4 25 4 11.4 522407 522440 0.0 1 32 32 100.0 26 124 147.5 76 61.3 76 28 22.6 531064 531176 0.0 21 127 128 83.6 31 55 108.5 40 72.7 40 9 16.4 540137 540190 0.0 3 49 53 88.7 14 73 107.0 50 68.5 50 16 21.9 557391 557458 0.0 3 64 70 88.6 6 23 67.0 18 78.3 18 3 13.0 571200 571222 0.0 3 22 23 87.0 11 144 144.0 86 59.7 86 34 23.6 586719 586853 0.0 3 121 123 96.7 30 50 95.5 35 70.0 35 7 14.0 599465 599511 0.0 3 48 48 95.8 7 54 105.0 38 70.4 38 6 11.1 599975 600025 0.0 30 80 82 62.2 5 21 72.0 18 85.7 18 1 4.8 604115 604135 0.0 5 24 26 76.9 27 173 176.5 105 60.7 105 39 22.5 621983 622127 0.0 2 163 163 99.4 24 46 98.0 34 73.9 34 4 8.7 655376 655420 0.0 6 48 52 82.7 22 94 119.0 60 63.8 60 20 21.3 668713 668800 0.0 1 80 101 79.2 8 141 145.5 88 62.4 88 30 21.3 669021 669148 0.0 28 151 154 80.5 32 69 120.0 47 68.1 47 7 10.1 670437 670500 0.0 9 75 84 79.8 Score for 37 query sequences (total 4025 bp) against target (679422 bp) = 4795.50
nrd1a's 7 exons against eelScaffold320
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 32 52 99.5 36 69.2 36 4 7.7 11214 11261 0.0 10 61 84 61.9 18 46 105.5 34 73.9 34 4 8.7 21678 21720 0.0 1 45 55 81.8 17 70 95.0 45 64.3 45 20 28.6 22625 22693 0.0 3 53 59 86.4 24 40 96.5 29 72.5 29 2 5.0 22849 22886 0.0 8 47 52 76.9 23 21 69.0 17 81.0 17 0 0.0 26374 26394 0.0 1 21 22 95.5 6 18 72.0 16 88.9 16 0 0.0 28762 28779 0.0 5 22 23 78.3 14 64 105.5 42 65.6 42 9 14.1 42560 42621 0.0 12 68 70 81.4 2 315 228.0 183 58.1 183 64 20.3 46282 46552 0.1 7 301 324 91.0 21 90 112.5 58 64.4 58 17 18.9 80763 80842 0.0 3 85 87 95.4 25 153 138.0 88 57.5 88 27 17.6 81195 81324 0.1 1 149 151 98.7 11 81 148.5 55 67.9 55 10 12.3 81550 81625 0.0 47 122 123 61.8 1 525 261.0 294 56.0 294 105 20.0 83281 83764 0.2 2 462 541 85.2 15 61 107.0 40 65.6 40 4 6.6 89643 89699 0.0 8 68 78 78.2 28 39 78.0 28 71.8 28 8 20.5 92398 92435 0.0 1 32 32 100.0 8 168 147.0 104 61.9 104 32 19.0 97584 97741 0.1 8 153 154 94.8 3 254 229.0 150 59.1 150 55 21.7 106270 106501 0.1 6 226 241 91.7 31 54 88.5 37 68.5 37 6 11.1 106293 106342 0.0 2 53 53 98.1 29 129 142.5 79 61.2 79 30 23.3 107213 107332 0.0 23 130 139 77.7 13 70 119.0 46 65.7 46 5 7.1 108538 108605 0.0 1 67 74 90.5 4 37 84.5 27 73.0 27 3 8.1 129499 129534 0.0 3 37 38 92.1 37 131 128.5 79 60.3 79 26 19.8 133637 133753 0.0 13 131 131 90.8 35 96 114.5 60 62.5 60 21 21.9 135814 135897 0.0 1 87 90 96.7 9 78 106.5 50 64.1 50 14 17.9 136915 136989 0.0 5 71 74 90.5 30 39 87.0 27 69.2 27 0 0.0 137258 137296 0.0 1 39 48 81.2 22 114 120.0 69 60.5 69 20 17.5 143789 143896 0.0 2 101 101 99.0 5 30 64.5 22 73.3 22 6 20.0 158927 158955 0.0 1 25 26 96.2 27 176 145.0 102 58.0 102 39 22.2 161163 161325 0.1 13 162 163 92.0 33 89 109.0 56 62.9 56 19 21.3 164489 164570 0.0 5 81 91 84.6 10 82 119.0 52 63.4 52 11 13.4 168702 168776 0.0 1 78 96 81.2 7 81 106.5 50 61.7 50 7 8.6 178661 178740 0.0 4 78 82 91.5 20 121 141.5 76 62.8 76 21 17.4 178727 178838 0.0 14 122 124 87.9 12 43 98.0 31 72.1 31 4 9.3 202832 202872 0.0 5 45 58 70.7 36 65 118.0 44 67.7 44 8 12.3 208219 208279 0.0 3 63 70 87.1 19 120 157.5 76 63.3 76 29 24.2 221343 221456 0.0 2 98 105 92.4 34 86 124.0 58 67.4 58 14 16.3 221411 221484 0.0 19 102 117 71.8 26 123 127.5 74 60.2 74 14 11.4 225229 225348 0.0 4 115 128 87.5 16 129 126.0 77 59.7 77 21 16.3 226946 227071 0.1 11 121 121 91.7 Score for 37 query sequences (total 4025 bp) against target (246433 bp) = 4519.50
nrd1a discussion
Also an inconclusive analysis. EelScaffold32 again is more likely to be harbouring this second gene. Being some three time longer, it has a higher prior likelihood for this.