Latest revision as of 13:34, 9 May 2017

1 Estimating Gene Count
2 Multi mapping reads
3 One transcript, one set of reads
4 Two transcripts, another set of reads
5 Aggregation to Gene-level 1
6 Third transcript, another set of reads
7 Aggregation to Gene-level 2
8 HTSeq-count
9 HTSeq-count modes
10 Probabilistic approach 1
11 Probabilistic approach 2

Estimating Gene Count

How many reads are overlapping genomic features? - or - Can we confidently assign each read to a feature/transcript/gene? Not so simple.

We also have:

Multi mapping reads
Overlapping genes/transcripts

Two approaches:

Focus on what’s known with certainty
Probabilistic

Multi mapping reads

Unsolved problem:

- this can account for 10-30% of reads

Ignore them, but then again this decreases sensitivity
Weighted assignment

Of course, longer reads would solve this problem.

One transcript, one set of reads

Two transcripts, another set of reads

Aggregation to Gene-level 1

Third transcript, another set of reads

Aggregation to Gene-level 2

HTSeq-count

Designed for RNA-Seq counting
Work at gene level
Remove multi-mapped reads
Several modes to resolve remaining uncertainty

HTSeq-count modes

Probabilistic approach 1

Cufflinks

- Reconstruct the transcripts from the data and annotation

Probabilistic approach 2

Cuffdiff:

- Assign each read/fragment to a transcript with a probability maximum likelihood.

@@ Line 30: / Line 30: @@
 = Two transcripts, another set of reads =
-[[File: t1t2.png
+[[File:t1t2.png]]
 = Aggregation to Gene-level 1 =
-[[File:tt1t2aggreg.png]]
+[[File:t1t2aggreg.png]]
 = Third transcript, another set of reads =
@@ Line 42: / Line 42: @@
 = Aggregation to Gene-level 2 =
-[[File:t1t2tt3aggreg.png]]
+[[File:t1t2t3aggreg.png]]
 = HTSeq-count =
-[[File:htseq.png]]
 * Designed for RNA-Seq counting
-* Simple to use (especially since v0.6.0)
 * Work at gene level
 * Remove multi-mapped reads
 * Several modes to resolve remaining uncertainty
-= HTSeq-count =
+[[File:htseq.png]]
+= HTSeq-count modes =
 [[File:htcats.png]]
-= Probabilistic approach =
+= Probabilistic approach 1 =
-Cufflink
-cuffdiff
-= Probabilistic approach =
-Cufflinks:
+* Cufflinks
-Reconstruct the transcripts from the data and annotation
+:- Reconstruct the transcripts from the data and annotation
-= Probabilistic approach =
+[[File:minpath.png]]
-Cufflinks:
-Reconstruct the transcripts from the data and annotation
-Cuffdiff:
+= Probabilistic approach 2 =
-Assign each read/fragment to a transcript
+* Cuffdiff:
-with a probability maximum likelihood.
+:- Assign each read/fragment to a transcript with a probability maximum likelihood.
-= Probabilistic approach =
+[[File:isolik.png]]
-Cufflinks:
-Reconstruct the transcripts from the data and annotation
-Pros:
-- Better methodology
-- Integrated package (ease of use)
-Cons:
-Cuffdiff:
-- Do not support alternative experiment design
-- History of heterogeneous results/versions
-* Assign each read/fragment to a transcript with a probability maximum likelihood.

Difference between revisions of "Estimating Gene Count Talk"

Latest revision as of 13:34, 9 May 2017

Contents

Estimating Gene Count

Multi mapping reads

One transcript, one set of reads

Two transcripts, another set of reads

Aggregation to Gene-level 1

Third transcript, another set of reads

Aggregation to Gene-level 2

HTSeq-count

HTSeq-count modes

Probabilistic approach 1

Probabilistic approach 2

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools