Estimating Gene Count Talk

From wiki

Jump to: navigation, search

Contents

1 Estimating Gene Count
2 Multi mapping reads
3 One transcript, one set of reads
4 Two transcripts, another set of reads
5 Aggregation to Gene-level 1
6 Third transcript, another set of reads
7 Aggregation to Gene-level 2
8 HTSeq-count
9 HTSeq-count modes
10 Probabilistic approach 1
11 Probabilistic approach 2

Estimating Gene Count

How many reads are overlapping genomic features? - or - Can we confidently assign each read to a feature/transcript/gene? Not so simple.

We also have:

Multi mapping reads
Overlapping genes/transcripts

Two approaches:

Focus on what’s known with certainty
Probabilistic

Multi mapping reads

Unsolved problem:

- this can account for 10-30% of reads

Ignore them, but then again this decreases sensitivity
Weighted assignment

Of course, longer reads would solve this problem.

One transcript, one set of reads

Two transcripts, another set of reads

Aggregation to Gene-level 1

Third transcript, another set of reads

Aggregation to Gene-level 2

HTSeq-count

Designed for RNA-Seq counting
Work at gene level
Remove multi-mapped reads
Several modes to resolve remaining uncertainty

HTSeq-count modes

Probabilistic approach 1

Cufflinks

- Reconstruct the transcripts from the data and annotation

Probabilistic approach 2

Cuffdiff:

- Assign each read/fragment to a transcript with a probability maximum likelihood.

Retrieved from "http://stab.st-andrews.ac.uk/wiki/index.php?title=Estimating_Gene_Count_Talk&oldid=1628"