Motivation

Mapping to a reference genome is a vital step to generate counts and do differential gene expression thereafter. For RNA-Seq data it is important to choose an aligner which is splice-aware.

Aims

In this part you will learn to:

align RNA-Seq reads to a reference genome
calculate the mapping rate

You will use the following software:

TopHat2 v2.0.11: http://ccb.jhu.edu/software/tophat/index.shtml
Bowtie2 v2.2.0: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

The data set you'll be using is downloaded from ENA (http://www.ebi.ac.uk/ena/data/view/SRP019027). The reads belong to sample SRR769316. The data set is tailored with respect to the time allocated for the exercise.

Indexing

Go to the right folder/directory:

cd $HOME/i2rda_data/Mapping_to_Reference

Index the reference genome using one of the Bowtie2:

cd Reference
bowtie2-build mm10_chr19-1-20000000.fa mm10_chr19-1-20000000

Note the "fa" extension for the reference this is due to a preference of tophat which we'll be using below.

Run the alignment using TopHat2:

cd ..
tophat -o tophat2 --no-mixed --rg-id Lane-1 --rg-sample sample1 --rg-center XYZ --rg-platform Illumina -G Reference/mm10_chr19-1-20000000_Ensembl.gtf Reference/mm10_chr19-1-20000000 $HOME/i2rda_data/Mapping_to_Reference/Read_1.fastq.gz $HOME/i2rda_data/Mapping_to_Reference/Read_2.fastq.gz

where:

--no-mixed: For paired reads, only report read alignments if both reads in a pair can be mapped
--rg-id: Read group ID
--rg-sample: Sample ID
--rg-center: Sequencing Centre name
--rg-platform: Sequencing platform descriptor
-G: Supply TopHat with a set of gene model annotations and/or known transcripts, as a GTF 2.2 or GFF3 formatted file.
-o: Output directory

Check the output of TopHat2:

cd tophat2
ls

Get the mapping rate:

cat align_summary.txt

Get the number of reads mapped. Run the alignment of filtered data using TopHat2

cd ..
tophat2 -o tophat2_with_filtered_data --no-mixed \
--rg-id Lane-1 --rg-sample sample1 --rg-center XYZ --rg-platform Illumina \
-G Reference/mm10_chr19-1-20000000_Ensembl.gtf Reference/mm10_chr19-1-20000000 \
$HOME/i2rda_data/Mapping_to_Reference/Read_1_q30l50.fastq.gz \
$HOME/i2rda_data/Mapping_to_Reference/Read_2_q30l50.fastq.gz

Check the output of TopHat2:

cd tophat2_with_filtered_data
ls

Get the mapping rate:

cat align_summary.txt

Get the number of reads mapped.

What difference does using the filtered data make?

Mapping to Reference

Motivation

Aims

Indexing

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools