Two Eel Scaffolds
Contents
Introduction
Two DNA scaffolds are presented:
- eelScaffold32. 679 422 bp and 42.25% GC.
- eelScaffold320. 246 433 bp and 43.47% GC.
We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.
There are two genes expected to be around about the regions covered by these scaffolds:
- eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
- eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)
ORF Analysis
ORF scans for sequences over 100 kbp often throw up too much data, but it can be useful first step to see the complexity of the sequence.
Gene Predictor
One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.
An example Augustus command line is as follows:
augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf
Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:
gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff
We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:
getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf
Detecting presence of pdcd10b and nrd1a
We take the exons of these genes and apply Waterstone alignment (via the Emboss program) with the scaffolds to them. We order via scaffold starting site.
pdcd10b's 7 exons against eelScaffold32
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 5 89 125.5 56 62.9 56 15 16.9 92493 92577 0.0 1 78 79 98.7 6 97 113.0 59 60.8 59 20 20.6 222386 222481 0.0 4 81 83 94.0 7 80 112.0 52 65.0 52 16 20.0 153839 153911 0.0 1 71 82 86.6 2 54 105.0 37 68.5 37 7 13.0 270271 270324 0.0 2 48 54 87.0 1 91 134.0 58 63.7 58 9 9.9 305491 305572 0.0 6 96 96 94.8 4 149 161.5 91 61.1 91 29 19.5 337277 337419 0.0 2 127 127 99.2 3 112 129.5 69 61.6 69 16 14.3 607211 607313 0.0 7 111 118 89.0 Total score for all alignments = 880.5
pdcd10b's 7 exons against eelScaffold320
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 5 51 105.0 36 70.6 36 5 9.8 13517 13567 0.0 27 72 79 58.2 1 110 130.0 70 63.6 70 25 22.7 52225 52323 0.0 1 96 96 100.0 6 89 116.5 55 61.8 55 15 16.9 60604 60688 0.0 6 83 83 94.0 2 50 97.0 36 72.0 36 7 14.0 64406 64450 0.0 1 48 54 88.9 7 84 112.5 55 65.5 55 14 16.7 162512 162589 0.0 7 82 82 92.7 4 124 138.5 79 63.7 79 19 15.3 222772 222891 0.0 18 126 127 85.8 3 79 131.0 54 68.4 54 9 11.4 238985 239059 0.0 24 97 118 62.7
pdcd10b discussion
This is a small gene, however these alignments are not conclusive, though it is clear that eelScaffold32 has a higher probability of harbouring pdcd10b.
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 2 270 217.5 160 59.3 160 60 22.2 4926 5175 0.0 34 263 324 71.0 3 235 257.0 143 60.9 143 35 14.9 9693 9920 0.0 5 211 241 85.9 15 88 117.5 54 61.4 54 14 15.9 66759 66843 0.0 2 78 78 98.7 12 34 104.0 28 82.4 28 2 5.9 67614 67647 0.0 24 55 58 55.2 36 73 114.5 47 64.4 47 14 19.2 140070 140139 0.0 3 64 70 88.6 4 37 104.0 28 75.7 28 0 0.0 159420 159456 0.0 2 38 38 97.4 23 18 72.0 16 88.9 16 0 0.0 190407 190424 0.0 5 22 22 81.8 33 108 115.5 66 61.1 66 24 22.2 214642 214747 0.0 5 90 91 94.5 19 108 133.5 69 63.9 69 13 12.0 222360 222458 0.0 1 104 105 99.0 13 84 109.5 52 61.9 52 15 17.9 237021 237101 0.0 3 74 74 97.3 17 60 97.5 41 68.3 41 14 23.3 258142 258195 0.0 2 53 59 88.1 35 98 124.0 60 61.2 60 19 19.4 259372 259461 0.0 1 87 90 96.7 29 126 162.0 79 62.7 79 20 15.9 268052 268173 0.0 6 115 139 79.1 18 46 105.5 34 73.9 34 4 8.7 281107 281152 0.0 1 42 55 76.4 10 107 125.5 69 64.5 69 20 18.7 342133 342236 0.0 6 95 96 93.8 21 94 131.0 60 63.8 60 18 19.1 365037 365120 0.0 1 86 87 98.9 20 136 131.0 82 60.3 82 28 20.6 380436 380559 0.0 2 121 124 96.8 34 127 131.0 78 61.4 78 28 22.0 387971 388090 0.0 1 106 117 90.6 25 140 154.0 86 61.4 86 20 14.3 409140 409267 0.0 9 140 151 87.4 37 125 149.5 78 62.4 78 30 24.0 411212 411315 0.0 15 130 131 88.5 9 77 113.5 50 64.9 50 11 14.3 478120 478193 0.0 4 72 74 93.2 1 609 297.5 343 56.3 343 143 23.5 491670 492216 0.1 4 531 541 97.6 16 120 147.0 72 60.0 72 11 9.2 519285 519397 0.0 2 117 121 95.9 28 35 76.0 25 71.4 25 4 11.4 522407 522440 0.0 1 32 32 100.0 26 124 147.5 76 61.3 76 28 22.6 531064 531176 0.0 21 127 128 83.6 31 55 108.5 40 72.7 40 9 16.4 540137 540190 0.0 3 49 53 88.7 14 73 107.0 50 68.5 50 16 21.9 557391 557458 0.0 3 64 70 88.6 6 23 67.0 18 78.3 18 3 13.0 571200 571222 0.0 3 22 23 87.0 11 144 144.0 86 59.7 86 34 23.6 586719 586853 0.0 3 121 123 96.7 30 50 95.5 35 70.0 35 7 14.0 599465 599511 0.0 3 48 48 95.8 7 54 105.0 38 70.4 38 6 11.1 599975 600025 0.0 30 80 82 62.2 5 21 72.0 18 85.7 18 1 4.8 604115 604135 0.0 5 24 26 76.9 27 173 176.5 105 60.7 105 39 22.5 621983 622127 0.0 2 163 163 99.4 24 46 98.0 34 73.9 34 4 8.7 655376 655420 0.0 6 48 52 82.7 22 94 119.0 60 63.8 60 20 21.3 668713 668800 0.0 1 80 101 79.2 8 141 145.5 88 62.4 88 30 21.3 669021 669148 0.0 28 151 154 80.5 32 69 120.0 47 68.1 47 7 10.1 670437 670500 0.0 9 75 84 79.