Latest revision as of 14:34, 9 May 2017

1 Estimating Gene Count
2 Multi mapping reads
3 One transcript, one set of reads
4 Two transcripts, another set of reads
5 Aggregation to Gene-level 1
6 Third transcript, another set of reads
7 Aggregation to Gene-level 2
8 HTSeq-count
9 HTSeq-count modes
10 Probabilistic approach 1
11 Probabilistic approach 2

Estimating Gene Count

How many reads are overlapping genomic features? - or - Can we confidently assign each read to a feature/transcript/gene? Not so simple.

We also have:

Multi mapping reads
Overlapping genes/transcripts

Two approaches:

Focus on what’s known with certainty
Probabilistic

Multi mapping reads

Unsolved problem:

- this can account for 10-30% of reads

Ignore them, but then again this decreases sensitivity
Weighted assignment

Of course, longer reads would solve this problem.

One transcript, one set of reads

Two transcripts, another set of reads

Aggregation to Gene-level 1

Third transcript, another set of reads

Aggregation to Gene-level 2

HTSeq-count

Designed for RNA-Seq counting
Work at gene level
Remove multi-mapped reads
Several modes to resolve remaining uncertainty

HTSeq-count modes

Probabilistic approach 1

Cufflinks

- Reconstruct the transcripts from the data and annotation

Probabilistic approach 2

Cuffdiff:

- Assign each read/fragment to a transcript with a probability maximum likelihood.

@@ Line 1: / Line 1: @@
-Estimating gene count
 = Estimating Gene Count =
 How many reads are overlapping genomic features?
@@ Line 17: / Line 15: @@
 = Multi mapping reads =
 * Unsolved problem:
-– Can account for 10-30% of reads
+:- this can account for 10-30% of reads
-GeneA – chr11
-GeneB – chr5
-– Ignore them … (decrease sensitivity)
-– Weighted assignment
-= Multi mapping reads =
-* Unsolved problem:
-– Can account for 10-30% of reads
-GeneA – chr11
-GeneB – chr5
-Solution is to use longer reads
-– Ignore them … (decrease sensitivity)
-– Weighted assignment
-= Transcripts/Genes =
-* Transcripts/Isoforms or Genes
-T1
-GeneA
-= Transcripts/Genes =
-* Transcripts/Isoforms or Genes
-T1
-T2
-GeneA
-T3
-= Transcripts/Genes =
+[[File:unsolved.png]]
-* Transcripts/Isoforms or Genes
-T1
-T2
-GeneA
+* Ignore them, but then again this decreases sensitivity
+* Weighted assignment
-= Transcripts/Genes =
+Of course, longer reads would solve this problem.
-* Transcripts/Isoforms or Genes
-T1
-T2
-GeneA
+= One transcript, one set of reads =
-T3
+[[File:t1.png]]
-= Transcripts/Genes =
+= Two transcripts, another set of reads =
-* Transcripts/Isoforms or Genes
-T1
-T2
-GeneA
+[[File:t1t2.png]]
-T3
+= Aggregation to Gene-level 1 =
-= Transcripts/Genes =
+[[File:t1t2aggreg.png]]
-* Transcripts/Isoforms or Genes
-T1
-T2
-GeneA
+= Third transcript, another set of reads =
-T3
+[[File:t1t2t3.png]]
-= Transcripts/Genes =
+= Aggregation to Gene-level 2 =
-* Transcripts/Isoforms or Genes
-T1
-T2
-GeneA
+[[File:t1t2t3aggreg.png]]
-T3
-Gene level is aggregating transcripts
-Transcript level needs longer reads
 = HTSeq-count =
 * Designed for RNA-Seq counting
-* Simple to use (especially since v0.6.0)
 * Work at gene level
 * Remove multi-mapped reads
 * Several modes to resolve remaining uncertainty
-= HTSeq-count =
+[[File:htseq.png]]
-= Probabilistic approach =
-Cufflink
+= HTSeq-count modes =
-cuffdiff
+[[File:htcats.png]]
-= Probabilistic approach =
+= Probabilistic approach 1 =
-Cufflinks:
+* Cufflinks
-Reconstruct the transcripts from the data and annotation
+:- Reconstruct the transcripts from the data and annotation
-= Probabilistic approach =
+[[File:minpath.png]]
-Cufflinks:
-Reconstruct the transcripts from the data and annotation
-Cuffdiff:
+= Probabilistic approach 2 =
-Assign each read/fragment to a transcript
+* Cuffdiff:
-with a probability maximum likelihood.
+:- Assign each read/fragment to a transcript with a probability maximum likelihood.
-= Probabilistic approach =
+[[File:isolik.png]]
-Cufflinks:
-Reconstruct the transcripts from the data and annotation
-Pros:
-- Better methodology
-- Integrated package (ease of use)
-Cons:
-Cuffdiff:
-- Do not support alternative experiment design
-- History of heterogeneous results/versions
-* Assign each read/fragment to a transcript with a probability maximum likelihood.

Difference between revisions of "Estimating Gene Count Talk"

Latest revision as of 14:34, 9 May 2017

Contents

Estimating Gene Count

Multi mapping reads

One transcript, one set of reads

Two transcripts, another set of reads

Aggregation to Gene-level 1

Third transcript, another set of reads

Aggregation to Gene-level 2

HTSeq-count

HTSeq-count modes

Probabilistic approach 1

Probabilistic approach 2

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools