Estimating gene count

1 Estimating Gene Count
2 Multi mapping reads
3 Multi mapping reads
4 Transcripts/Genes
5 Transcripts/Genes
6 Transcripts/Genes
7 Transcripts/Genes
8 Transcripts/Genes
9 Transcripts/Genes
10 Transcripts/Genes
11 HTSeq-count
12 HTSeq-count
13 Probabilistic approach
14 Probabilistic approach
15 Probabilistic approach
16 Probabilistic approach

Estimating Gene Count

How many reads are overlapping genomic features? - or - Can we confidently assign each read to a feature/transcript/gene? Not so simple.

We also have:

Multi mapping reads
Overlapping genes/transcripts

Two approaches:

Focus on what’s known with certainty
Probabilistic

Multi mapping reads

Unsolved problem:

– Can account for 10-30% of reads GeneA – chr11 GeneB – chr5

– Ignore them … (decrease sensitivity) – Weighted assignment

Multi mapping reads

Unsolved problem:

– Can account for 10-30% of reads GeneA – chr11 GeneB – chr5

Solution is to use longer reads – Ignore them … (decrease sensitivity) – Weighted assignment

Transcripts/Genes

Transcripts/Isoforms or Genes

T1 GeneA

Transcripts/Genes

Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Transcripts/Genes

Transcripts/Isoforms or Genes

T1 T2

GeneA

Transcripts/Genes

Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Transcripts/Genes

Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Transcripts/Genes

Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Transcripts/Genes

Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Gene level is aggregating transcripts Transcript level needs longer reads

HTSeq-count

Designed for RNA-Seq counting
Simple to use (especially since v0.6.0)
Work at gene level
Remove multi-mapped reads
Several modes to resolve remaining uncertainty

HTSeq-count

Probabilistic approach

Cufflink

cuffdiff

Probabilistic approach

Cufflinks: Reconstruct the transcripts from the data and annotation

Probabilistic approach

Cufflinks: Reconstruct the transcripts from the data and annotation

Cuffdiff: Assign each read/fragment to a transcript with a probability maximum likelihood.

Probabilistic approach

Cufflinks: Reconstruct the transcripts from the data and annotation Pros: - Better methodology - Integrated package (ease of use) Cons: Cuffdiff: - Do not support alternative experiment design - History of heterogeneous results/versions

Assign each read/fragment to a transcript with a probability maximum likelihood.

Estimating Gene Count Talk

Contents

Estimating Gene Count

Multi mapping reads

Multi mapping reads

Transcripts/Genes

Transcripts/Genes

Transcripts/Genes

Transcripts/Genes

Transcripts/Genes

Transcripts/Genes

Transcripts/Genes

HTSeq-count

HTSeq-count

Probabilistic approach

Probabilistic approach

Probabilistic approach

Probabilistic approach

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools