Difference between revisions of "Two Eel Scaffolds"
Line 11: | Line 11: | ||
* '''eelScaffold32''' contains any part of [http://www.ensembl.org/Oreochromis_niloticus/Gene/Summary?g=ENSONIG00000001157&db=core PDCD10b] (Programmed cell death 10b). | * '''eelScaffold32''' contains any part of [http://www.ensembl.org/Oreochromis_niloticus/Gene/Summary?g=ENSONIG00000001157&db=core PDCD10b] (Programmed cell death 10b). | ||
* '''eelScaffold320''' contains any part of [http://www.ensembl.org/Oreochromis_niloticus/Gene/Summary?g=ENSONIG00000001167&db=core nrd1a] (Nardilysin, N-arginine dibasic convertase) | * '''eelScaffold320''' contains any part of [http://www.ensembl.org/Oreochromis_niloticus/Gene/Summary?g=ENSONIG00000001167&db=core nrd1a] (Nardilysin, N-arginine dibasic convertase) | ||
− | |||
− | |||
− | |||
− | |||
= Gene Predictor = | = Gene Predictor = |
Revision as of 15:41, 13 May 2016
Contents
Introduction
Two DNA scaffolds are presented:
- eelScaffold32. 679 422 bp and 42.25% GC.
- eelScaffold320. 246 433 bp and 43.47% GC.
We take tilapia (Oreochromis niloticus, Ensembl abbreviation ONI) to be the reference.
There are two genes expected to be around about the regions covered by these scaffolds:
- eelScaffold32 contains any part of PDCD10b (Programmed cell death 10b).
- eelScaffold320 contains any part of nrd1a (Nardilysin, N-arginine dibasic convertase)
Gene Predictor
One of the most up-to-date (2016) gene predictors is Augustus. It uses HMM profiles based on a related organism. In terms of eel, there are two given organisms: Zebra fish (zb) and Lamprey (lp) which Augustus makes available. Though tilapia is not available, it is possible - given time - to train and establish HMM profile for this organism.
An example Augustus command line is as follows:
augustus --species=lamprey eelScaffold320.fa >aug_s320_lp.gtf
Augustus outputs in the GTF format, for visual browsing on JBrowse, we need to convert to the related format, GFF:
gtf2gff.pl --printExon --gff3 < aug_s32_lp.gtf --out=aug_s32_lp.gff
We are also probably interested in the CDS and proteins sequences of the predicted genes, we can use the follown Augustus-supplied script:
getAnnoFasta.pl --seqfile=eelScaffold32.fa aug_s32_lp.gtf
In order to visually navigate the results of these annotations, they can be viewed on a browser here.
Detecting presence of pdcd10b and nrd1a
We take the exons of these genes and apply Smith-Waterman alignment (via the Emboss program) with the scaffolds to them. We order via scaffold starting site.
pdcd10b's 7 exons against eelScaffold32
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 5 89 125.5 56 62.9 56 15 16.9 92493 92577 0.0 1 78 79 98.7 6 97 113.0 59 60.8 59 20 20.6 222386 222481 0.0 4 81 83 94.0 7 80 112.0 52 65.0 52 16 20.0 153839 153911 0.0 1 71 82 86.6 2 54 105.0 37 68.5 37 7 13.0 270271 270324 0.0 2 48 54 87.0 1 91 134.0 58 63.7 58 9 9.9 305491 305572 0.0 6 96 96 94.8 4 149 161.5 91 61.1 91 29 19.5 337277 337419 0.0 2 127 127 99.2 3 112 129.5 69 61.6 69 16 14.3 607211 607313 0.0 7 111 118 89.0 Score for 7 query sequences (total 639 bp) against target (679422 bp) = 880.50
pdcd10b's 7 exons against eelScaffold320
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 5 51 105.0 36 70.6 36 5 9.8 13517 13567 0.0 27 72 79 58.2 1 110 130.0 70 63.6 70 25 22.7 52225 52323 0.0 1 96 96 100.0 6 89 116.5 55 61.8 55 15 16.9 60604 60688 0.0 6 83 83 94.0 2 50 97.0 36 72.0 36 7 14.0 64406 64450 0.0 1 48 54 88.9 7 84 112.5 55 65.5 55 14 16.7 162512 162589 0.0 7 82 82 92.7 4 124 138.5 79 63.7 79 19 15.3 222772 222891 0.0 18 126 127 85.8 3 79 131.0 54 68.4 54 9 11.4 238985 239059 0.0 24 97 118 62.7 Score for 7 query sequences (total 639 bp) against target (246433 bp) = 830.5
pdcd10b discussion
This is a small gene, however we cannot say these alignments are conclusive. It is clear however that eelScaffold32 has a higher probability of harbouring pdcd10b.
nrd1a's 37 exons against eelScaffold32
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 2 270 217.5 160 59.3 160 60 22.2 4926 5175 0.0 34 263 324 71.0 3 235 257.0 143 60.9 143 35 14.9 9693 9920 0.0 5 211 241 85.9 15 88 117.5 54 61.4 54 14 15.9 66759 66843 0.0 2 78 78 98.7 12 34 104.0 28 82.4 28 2 5.9 67614 67647 0.0 24 55 58 55.2 36 73 114.5 47 64.4 47 14 19.2 140070 140139 0.0 3 64 70 88.6 4 37 104.0 28 75.7 28 0 0.0 159420 159456 0.0 2 38 38 97.4 23 18 72.0 16 88.9 16 0 0.0 190407 190424 0.0 5 22 22 81.8 33 108 115.5 66 61.1 66 24 22.2 214642 214747 0.0 5 90 91 94.5 19 108 133.5 69 63.9 69 13 12.0 222360 222458 0.0 1 104 105 99.0 13 84 109.5 52 61.9 52 15 17.9 237021 237101 0.0 3 74 74 97.3 17 60 97.5 41 68.3 41 14 23.3 258142 258195 0.0 2 53 59 88.1 35 98 124.0 60 61.2 60 19 19.4 259372 259461 0.0 1 87 90 96.7 29 126 162.0 79 62.7 79 20 15.9 268052 268173 0.0 6 115 139 79.1 18 46 105.5 34 73.9 34 4 8.7 281107 281152 0.0 1 42 55 76.4 10 107 125.5 69 64.5 69 20 18.7 342133 342236 0.0 6 95 96 93.8 21 94 131.0 60 63.8 60 18 19.1 365037 365120 0.0 1 86 87 98.9 20 136 131.0 82 60.3 82 28 20.6 380436 380559 0.0 2 121 124 96.8 34 127 131.0 78 61.4 78 28 22.0 387971 388090 0.0 1 106 117 90.6 25 140 154.0 86 61.4 86 20 14.3 409140 409267 0.0 9 140 151 87.4 37 125 149.5 78 62.4 78 30 24.0 411212 411315 0.0 15 130 131 88.5 9 77 113.5 50 64.9 50 11 14.3 478120 478193 0.0 4 72 74 93.2 1 609 297.5 343 56.3 343 143 23.5 491670 492216 0.1 4 531 541 97.6 16 120 147.0 72 60.0 72 11 9.2 519285 519397 0.0 2 117 121 95.9 28 35 76.0 25 71.4 25 4 11.4 522407 522440 0.0 1 32 32 100.0 26 124 147.5 76 61.3 76 28 22.6 531064 531176 0.0 21 127 128 83.6 31 55 108.5 40 72.7 40 9 16.4 540137 540190 0.0 3 49 53 88.7 14 73 107.0 50 68.5 50 16 21.9 557391 557458 0.0 3 64 70 88.6 6 23 67.0 18 78.3 18 3 13.0 571200 571222 0.0 3 22 23 87.0 11 144 144.0 86 59.7 86 34 23.6 586719 586853 0.0 3 121 123 96.7 30 50 95.5 35 70.0 35 7 14.0 599465 599511 0.0 3 48 48 95.8 7 54 105.0 38 70.4 38 6 11.1 599975 600025 0.0 30 80 82 62.2 5 21 72.0 18 85.7 18 1 4.8 604115 604135 0.0 5 24 26 76.9 27 173 176.5 105 60.7 105 39 22.5 621983 622127 0.0 2 163 163 99.4 24 46 98.0 34 73.9 34 4 8.7 655376 655420 0.0 6 48 52 82.7 22 94 119.0 60 63.8 60 20 21.3 668713 668800 0.0 1 80 101 79.2 8 141 145.5 88 62.4 88 30 21.3 669021 669148 0.0 28 151 154 80.5 32 69 120.0 47 68.1 47 7 10.1 670437 670500 0.0 9 75 84 79.8 Score for 37 query sequences (total 4025 bp) against target (679422 bp) = 4795.50
nrd1a's 7 exons against eelScaffold320
SFI ALEN SCORE IDEN IPT SIM GAPS GPT TSC TEC PET QSC QEC QLN PEQ 32 52 99.5 36 69.2 36 4 7.7 11214 11261 0.0 10 61 84 61.9 18 46 105.5 34 73.9 34 4 8.7 21678 21720 0.0 1 45 55 81.8 17 70 95.0 45 64.3 45 20 28.6 22625 22693 0.0 3 53 59 86.4 24 40 96.5 29 72.5 29 2 5.0 22849 22886 0.0 8 47 52 76.9 23 21 69.0 17 81.0 17 0 0.0 26374 26394 0.0 1 21 22 95.5 6 18 72.0 16 88.9 16 0 0.0 28762 28779 0.0 5 22 23 78.3 14 64 105.5 42 65.6 42 9 14.1 42560 42621 0.0 12 68 70 81.4 2 315 228.0 183 58.1 183 64 20.3 46282 46552 0.1 7 301 324 91.0 21 90 112.5 58 64.4 58 17 18.9 80763 80842 0.0 3 85 87 95.4 25 153 138.0 88 57.5 88 27 17.6 81195 81324 0.1 1 149 151 98.7 11 81 148.5 55 67.9 55 10 12.3 81550 81625 0.0 47 122 123 61.8 1 525 261.0 294 56.0 294 105 20.0 83281 83764 0.2 2 462 541 85.2 15 61 107.0 40 65.6 40 4 6.6 89643 89699 0.0 8 68 78 78.2 28 39 78.0 28 71.8 28 8 20.5 92398 92435 0.0 1 32 32 100.0 8 168 147.0 104 61.9 104 32 19.0 97584 97741 0.1 8 153 154 94.8 3 254 229.0 150 59.1 150 55 21.7 106270 106501 0.1 6 226 241 91.7 31 54 88.5 37 68.5 37 6 11.1 106293 106342 0.0 2 53 53 98.1 29 129 142.5 79 61.2 79 30 23.3 107213 107332 0.0 23 130 139 77.7 13 70 119.0 46 65.7 46 5 7.1 108538 108605 0.0 1 67 74 90.5 4 37 84.5 27 73.0 27 3 8.1 129499 129534 0.0 3 37 38 92.1 37 131 128.5 79 60.3 79 26 19.8 133637 133753 0.0 13 131 131 90.8 35 96 114.5 60 62.5 60 21 21.9 135814 135897 0.0 1 87 90 96.7 9 78 106.5 50 64.1 50 14 17.9 136915 136989 0.0 5 71 74 90.5 30 39 87.0 27 69.2 27 0 0.0 137258 137296 0.0 1 39 48 81.2 22 114 120.0 69 60.5 69 20 17.5 143789 143896 0.0 2 101 101 99.0 5 30 64.5 22 73.3 22 6 20.0 158927 158955 0.0 1 25 26 96.2 27 176 145.0 102 58.0 102 39 22.2 161163 161325 0.1 13 162 163 92.0 33 89 109.0 56 62.9 56 19 21.3 164489 164570 0.0 5 81 91 84.6 10 82 119.0 52 63.4 52 11 13.4 168702 168776 0.0 1 78 96 81.2 7 81 106.5 50 61.7 50 7 8.6 178661 178740 0.0 4 78 82 91.5 20 121 141.5 76 62.8 76 21 17.4 178727 178838 0.0 14 122 124 87.9 12 43 98.0 31 72.1 31 4 9.3 202832 202872 0.0 5 45 58 70.7 36 65 118.0 44 67.7 44 8 12.3 208219 208279 0.0 3 63 70 87.1 19 120 157.5 76 63.3 76 29 24.2 221343 221456 0.0 2 98 105 92.4 34 86 124.0 58 67.4 58 14 16.3 221411 221484 0.0 19 102 117 71.8 26 123 127.5 74 60.2 74 14 11.4 225229 225348 0.0 4 115 128 87.5 16 129 126.0 77 59.7 77 21 16.3 226946 227071 0.1 11 121 121 91.7 Score for 37 query sequences (total 4025 bp) against target (246433 bp) = 4519.50
nrd1a discussion
Also an inconclusive analysis. EelScaffold32 again is more likely to be harbouring this second gene. Being some three time longer, it has a higher prior likelihood for this.