Difference between revisions of "Quality of Mapping Talk"
Line 50: | Line 50: | ||
= Completeness of data = | = Completeness of data = | ||
− | + | * From a saturated RNASeq dataset, all known splice junctions should be rediscovered. | |
− | + | * Check saturation by resampling resampling 5%,10%,..,100% of alignments, detect splice junctions from each subset and compare them to reference gene models | |
− | all known splice junctions should be | + | |
− | rediscovered. | + | [[File:comp.png]] |
− | * Check saturation by resampling | ||
− | resampling 5%,10%,..,100% of | ||
− | alignments, detect splice junctions | ||
− | from each subset and compare them to | ||
− | reference gene models | ||
− | |||
− | |||
− |
Latest revision as of 12:22, 9 May 2017
Contents
Mapping quality control
Some issues are only detectable in the context of the genome:
- Duplicate reads
- Fragment size distribution
- Gene coverage
- Completeness of data
Duplicate reads
- Only detectable with paired end reads
Duplicate reads 2
- Duplicates can be PCR artefacts
- Duplicates can be real, from highly expressed transcripts
- For RNA-seq, removing duplicates is still being debated
- We don’t remove them, but it’s important to:
- - assess the duplicate rate
- - determine whether the duplicate rate can be explained by a few highly expressed genes
Fragment size distribution
* Should correspond with
fragment size selected during library preparation
reads can span introns when calculating fragment size |
Gene coverage
* Read coverage of the gene
should be uniform
is expected because of degradation of the RNA
3' bias |
Completeness of data
- From a saturated RNASeq dataset, all known splice junctions should be rediscovered.
- Check saturation by resampling resampling 5%,10%,..,100% of alignments, detect splice junctions from each subset and compare them to reference gene models