Difference between revisions of "Estimating Gene Count Talk"

From wiki
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 30: Line 30:
 
= Two transcripts, another set of reads =
 
= Two transcripts, another set of reads =
  
[[File: t1t2.png
+
[[File:t1t2.png]]
  
 
= Aggregation to Gene-level 1 =
 
= Aggregation to Gene-level 1 =
  
[[File:tt1t2aggreg.png]]
+
[[File:t1t2aggreg.png]]
  
 
= Third transcript, another set of reads =
 
= Third transcript, another set of reads =
Line 42: Line 42:
 
= Aggregation to Gene-level 2 =
 
= Aggregation to Gene-level 2 =
  
[[File:t1t2tt3aggreg.png]]
+
[[File:t1t2t3aggreg.png]]
  
 
= HTSeq-count =
 
= HTSeq-count =
 
[[File:htseq.png]]
 
  
 
* Designed for RNA-Seq counting
 
* Designed for RNA-Seq counting
* Simple to use (especially since v0.6.0)
 
 
* Work at gene level
 
* Work at gene level
 
* Remove multi-mapped reads
 
* Remove multi-mapped reads
 
* Several modes to resolve remaining uncertainty
 
* Several modes to resolve remaining uncertainty
  
= HTSeq-count =
+
[[File:htseq.png]]
 +
 
 +
= HTSeq-count modes =
  
 
[[File:htcats.png]]
 
[[File:htcats.png]]
  
= Probabilistic approach =
+
= Probabilistic approach 1 =
 
 
Cufflink
 
 
 
cuffdiff
 
 
 
= Probabilistic approach =
 
  
Cufflinks:
+
* Cufflinks
Reconstruct the transcripts from the data and annotation
+
:- Reconstruct the transcripts from the data and annotation
  
= Probabilistic approach =
+
[[File:minpath.png]]
Cufflinks:
 
Reconstruct the transcripts from the data and annotation
 
  
Cuffdiff:
+
= Probabilistic approach 2 =
Assign each read/fragment to a transcript
+
* Cuffdiff:
with a probability maximum likelihood.
+
:- Assign each read/fragment to a transcript with a probability maximum likelihood.
  
= Probabilistic approach =
+
[[File:isolik.png]]
Cufflinks:
 
Reconstruct the transcripts from the data and annotation
 
Pros:
 
- Better methodology
 
- Integrated package (ease of use)
 
Cons:
 
Cuffdiff:
 
- Do not support alternative experiment design
 
- History of heterogeneous results/versions
 
* Assign each read/fragment to a transcript with a probability maximum likelihood.
 

Latest revision as of 14:34, 9 May 2017

Estimating Gene Count

How many reads are overlapping genomic features? - or - Can we confidently assign each read to a feature/transcript/gene? Not so simple.

We also have:

  • Multi mapping reads
  • Overlapping genes/transcripts

Two approaches:

  • Focus on what’s known with certainty
  • Probabilistic

Multi mapping reads

  • Unsolved problem:
- this can account for 10-30% of reads

Unsolved.png

  • Ignore them, but then again this decreases sensitivity
  • Weighted assignment

Of course, longer reads would solve this problem.

One transcript, one set of reads

T1.png

Two transcripts, another set of reads

T1t2.png

Aggregation to Gene-level 1

T1t2aggreg.png

Third transcript, another set of reads

T1t2t3.png

Aggregation to Gene-level 2

T1t2t3aggreg.png

HTSeq-count

  • Designed for RNA-Seq counting
  • Work at gene level
  • Remove multi-mapped reads
  • Several modes to resolve remaining uncertainty

Htseq.png

HTSeq-count modes

Htcats.png

Probabilistic approach 1

  • Cufflinks
- Reconstruct the transcripts from the data and annotation

Minpath.png

Probabilistic approach 2

  • Cuffdiff:
- Assign each read/fragment to a transcript with a probability maximum likelihood.

Isolik.png