Difference between revisions of "Estimating Gene Count Talk"
(One intermediate revision by the same user not shown) | |||
Line 30: | Line 30: | ||
= Two transcripts, another set of reads = | = Two transcripts, another set of reads = | ||
− | [[File: t1t2.png | + | [[File:t1t2.png]] |
= Aggregation to Gene-level 1 = | = Aggregation to Gene-level 1 = | ||
− | [[File: | + | [[File:t1t2aggreg.png]] |
= Third transcript, another set of reads = | = Third transcript, another set of reads = | ||
Line 42: | Line 42: | ||
= Aggregation to Gene-level 2 = | = Aggregation to Gene-level 2 = | ||
− | [[File: | + | [[File:t1t2t3aggreg.png]] |
= HTSeq-count = | = HTSeq-count = | ||
− | |||
− | |||
* Designed for RNA-Seq counting | * Designed for RNA-Seq counting | ||
− | |||
* Work at gene level | * Work at gene level | ||
* Remove multi-mapped reads | * Remove multi-mapped reads | ||
* Several modes to resolve remaining uncertainty | * Several modes to resolve remaining uncertainty | ||
− | = HTSeq-count = | + | [[File:htseq.png]] |
+ | |||
+ | = HTSeq-count modes = | ||
[[File:htcats.png]] | [[File:htcats.png]] | ||
− | = Probabilistic approach | + | = Probabilistic approach 1 = |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | Cufflinks: | + | * Cufflinks |
− | Reconstruct the transcripts from the data and annotation | + | :- Reconstruct the transcripts from the data and annotation |
− | + | [[File:minpath.png]] | |
− | |||
− | |||
− | Cuffdiff: | + | = Probabilistic approach 2 = |
− | Assign each read/fragment to a transcript | + | * Cuffdiff: |
− | with a probability maximum likelihood. | + | :- Assign each read/fragment to a transcript with a probability maximum likelihood. |
− | + | [[File:isolik.png]] | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |
Latest revision as of 13:34, 9 May 2017
Contents
- 1 Estimating Gene Count
- 2 Multi mapping reads
- 3 One transcript, one set of reads
- 4 Two transcripts, another set of reads
- 5 Aggregation to Gene-level 1
- 6 Third transcript, another set of reads
- 7 Aggregation to Gene-level 2
- 8 HTSeq-count
- 9 HTSeq-count modes
- 10 Probabilistic approach 1
- 11 Probabilistic approach 2
Estimating Gene Count
How many reads are overlapping genomic features? - or - Can we confidently assign each read to a feature/transcript/gene? Not so simple.
We also have:
- Multi mapping reads
- Overlapping genes/transcripts
Two approaches:
- Focus on what’s known with certainty
- Probabilistic
Multi mapping reads
- Unsolved problem:
- - this can account for 10-30% of reads
- Ignore them, but then again this decreases sensitivity
- Weighted assignment
Of course, longer reads would solve this problem.
One transcript, one set of reads
Two transcripts, another set of reads
Aggregation to Gene-level 1
Third transcript, another set of reads
Aggregation to Gene-level 2
HTSeq-count
- Designed for RNA-Seq counting
- Work at gene level
- Remove multi-mapped reads
- Several modes to resolve remaining uncertainty
HTSeq-count modes
Probabilistic approach 1
- Cufflinks
- - Reconstruct the transcripts from the data and annotation
Probabilistic approach 2
- Cuffdiff:
- - Assign each read/fragment to a transcript with a probability maximum likelihood.