Estimating Gene Count Talk

From wiki
Revision as of 14:34, 9 May 2017 by Rf (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Estimating Gene Count

How many reads are overlapping genomic features? - or - Can we confidently assign each read to a feature/transcript/gene? Not so simple.

We also have:

  • Multi mapping reads
  • Overlapping genes/transcripts

Two approaches:

  • Focus on what’s known with certainty
  • Probabilistic

Multi mapping reads

  • Unsolved problem:
- this can account for 10-30% of reads

Unsolved.png

  • Ignore them, but then again this decreases sensitivity
  • Weighted assignment

Of course, longer reads would solve this problem.

One transcript, one set of reads

T1.png

Two transcripts, another set of reads

T1t2.png

Aggregation to Gene-level 1

T1t2aggreg.png

Third transcript, another set of reads

T1t2t3.png

Aggregation to Gene-level 2

T1t2t3aggreg.png

HTSeq-count

  • Designed for RNA-Seq counting
  • Work at gene level
  • Remove multi-mapped reads
  • Several modes to resolve remaining uncertainty

Htseq.png

HTSeq-count modes

Htcats.png

Probabilistic approach 1

  • Cufflinks
- Reconstruct the transcripts from the data and annotation

Minpath.png

Probabilistic approach 2

  • Cuffdiff:
- Assign each read/fragment to a transcript with a probability maximum likelihood.

Isolik.png