Estimating Gene Count Talk

From wiki
Revision as of 13:33, 9 May 2017 by Rf (talk | contribs) (Created page with "Estimating gene count = Estimating Gene Count = How many reads are overlapping genomic features? - or - Can we confidently assign each read to a feature/transcript/gene? Not...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Estimating gene count

Estimating Gene Count

How many reads are overlapping genomic features? - or - Can we confidently assign each read to a feature/transcript/gene? Not so simple.

We also have:

  • Multi mapping reads
  • Overlapping genes/transcripts

Two approaches:

  • Focus on what’s known with certainty
  • Probabilistic

Multi mapping reads

  • Unsolved problem:

– Can account for 10-30% of reads GeneA – chr11 GeneB – chr5

– Ignore them … (decrease sensitivity) – Weighted assignment

Multi mapping reads

  • Unsolved problem:

– Can account for 10-30% of reads GeneA – chr11 GeneB – chr5

Solution is to use longer reads – Ignore them … (decrease sensitivity) – Weighted assignment

Transcripts/Genes

  • Transcripts/Isoforms or Genes

T1 GeneA

Transcripts/Genes

  • Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Transcripts/Genes

  • Transcripts/Isoforms or Genes

T1 T2

GeneA

Transcripts/Genes

  • Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Transcripts/Genes

  • Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Transcripts/Genes

  • Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Transcripts/Genes

  • Transcripts/Isoforms or Genes

T1 T2

GeneA

T3

Gene level is aggregating transcripts Transcript level needs longer reads

HTSeq-count

  • Designed for RNA-Seq counting
  • Simple to use (especially since v0.6.0)
  • Work at gene level
  • Remove multi-mapped reads
  • Several modes to resolve remaining uncertainty

HTSeq-count

Probabilistic approach

Cufflink

cuffdiff

Probabilistic approach

Cufflinks: Reconstruct the transcripts from the data and annotation

Probabilistic approach

Cufflinks: Reconstruct the transcripts from the data and annotation

Cuffdiff: Assign each read/fragment to a transcript with a probability maximum likelihood.

Probabilistic approach

Cufflinks: Reconstruct the transcripts from the data and annotation Pros: - Better methodology - Integrated package (ease of use) Cons: Cuffdiff: - Do not support alternative experiment design - History of heterogeneous results/versions

  • Assign each read/fragment to a transcript with a probability maximum likelihood.