Difference between revisions of "Visualisation of mapped reads"

From wiki
Jump to: navigation, search
(Created page with "= Introduction = In contrast to the other more quantitative stages, this exercise is qualitative in the sense that we get a visual feel for a certain area of interest. = Aim...")
 
 
(5 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
= Aims =
 
= Aims =
 
In this part you will learn to:
 
In this part you will learn to:
visualise mapped reads
+
* Use a genome browser, the Broad Institute's IGV, to visualise mapped reads
 +
 
 +
[[File:igvlayout.png]]
  
 
= Software to be used =
 
= Software to be used =
Line 15: Line 17:
 
  module load samtools IGV
 
  module load samtools IGV
  
The data set you'll be using is downloaded from ENA (http://www.ebi.ac.uk/ena/data/view/SRP019027). The reads belong to
+
We'll be using the same data as before, but this time we will have two alignment files (i.e. two samples) from the same study. They are samples <code>SRR769314</code> and <code>SRR769316</code>. The are tailored with respect to the time allocated for the workshop. They were aligned to the first 20 Mb of chromosome 19 of the mouse reference genome (GRCm38/mm10) using TopHat and duplicates have already been marked using Picard MarkDuplicates.
samples SRR769314 and SRR769316. The data set is tailored with respect to the time allocated for the workshop. Reads were alig
 
ned to the first 20 Mb of chromosome 19 of the mouse reference genome (GRCm38/mm10) using TopHat and duplicates marked
 
using Picard MarkDuplicates.
 
You will use the following files:
 
SRR769314_duplicates_marked.bam: aligned reads (without and with using gene annotation)
 
SRR769316_duplicates_marked.bam: aligned reads (without and with using gene annotation)
 
mm10_chr19-1-20000000.fasta: mouse reference genome sequence
 
mm10_chr19-1-20000000_Ensembl.gtf: Ensembl mouse gene models
 
  
Type text like this in the terminal at the $ command prompt, then press the
+
We shall will use the following files:
[Enter] key to run the command.
+
* <code>SRR769314_duplicates_marked.bam</code>: aligned reads (without and with using gene annotation)
 +
* <code>SRR769316_duplicates_marked.bam</code>: aligned reads (without and with using gene annotation)
 +
* <code>mm10_chr19-1-20000000.fasta</code>: mouse reference genome sequence. Has has already been indexed.
 +
* <code>mm10_chr19-1-20000000_Ensembl.gtf</code>: Ensembl mouse gene models
  
Data
+
= Data location =
The data is available in the directory 06_Visualisation_of_mapped_reads:
+
The data is available in the directory <code>04_Visualisation_of_mapped_reads</code>, i.e.
  
cd /home/training/Data/06_Visualisation_of_mapped_reads
+
cd ~/i2rda_data/04_Visualisation_of_mapped_reads
  
Indexing BAM files
+
= Indexing BAM files =
 
To enable fast access to any part of the BAM files we need to create an index using samtools:
 
To enable fast access to any part of the BAM files we need to create an index using samtools:
  
samtools index SRR769314_duplicates_marked.bam
+
samtools index SRR769314_duplicates_marked.bam
samtools index SRR769316_duplicates_marked.bam
+
samtools index SRR769316_duplicates_marked.bam
  
Visualising mapped reads
+
= Starting and using IGV =
Start IGV:
 
  
igv.sh &
+
igv.sh &
  
To load the mouse genome:
+
== To load the mouse genome ==
  
Page 1
+
Select <code>Genomes</code> -> <code>Load Genome from File...</code>
 +
Navigate to <code><yourhomedirectory></code> -> <code>i2rda_data</code> -> <code>04_Visualisation_of_mapped_reads</code> -> <code>Reference_files</code>
 +
Select the <code>mm10_chr19-1-20000000.fasta</code> file
 +
Click <code>Open</code>
  
�Edinburgh Genomics - Introduction to RNA-seq Data Analysis 19 & 20 May 2016
+
== To load the alignments ==
 +
* Select <code>File</code> -> <code>Load</code> from File...
 +
* Navigate to <code><yourhomedirectory</code>> -> <code>i2rda_data</code> -> <code>04_Visualisation_of_mapped_reads</code>
 +
* Select both the <code>SRR769314_duplicates_marked.bam</code> and <code>SRR769316_duplicates_marked.bam</code> files by <code>Ctrl</code>+Click
 +
* Click <code>Open</code>
  
Select Genomes -> Load Genome from File...
+
== To load the Ensembl gene models ==
Navigate to home -> training -> Data/ -> 06_Visualisation_of_mapped_reads -> Reference
+
* <code>Select File</code> -> <code>Load from File...</code>
Select the mm10_chr19-1-20000000.fasta file
+
* Navigate to <code><yourhomedirectory></code> -> <code>i2rda_data</code> -> <code>04_Visualisation_of_mapped_reads</code> -> <code>Reference_files</code>
Click [Open]
+
* Select the <code>mm10_chr19-1-20000000_Ensembl.gtf</code> file
 +
* Click <code>Open</code>
 +
 
 +
= Zoom into target areas =
 +
Zoom in until you start seeing reads.
  
To load the alignments:
+
1. Navigate to <code>chr19:3715000-3718000</code>. Copy-paste for best accuracy. Note that IGV adds thousand separators to your location afterwards.
Select File -> Load from File...
+
2. Navigate to <code>chr19:5748800-5751100</code>. Zoom in to observe each end of this exon-exon junction.
Navigate to home -> training -> Data/ -> 06_Visualisation_of_mapped_reads
 
Select the SRR769314_duplicates_marked.bam and SRR769316_duplicates_marked.bam files
 
Click [Open]
 
  
To load the Ensembl gene models:
+
<ins>Question</ins>:
Select File -> Load from File...
+
* What do you think of the alignment? How would you fix it?
Navigate to home -> training -> Data/ -> 06_Visualisation_of_mapped_reads -> Reference
 
Select the mm10_chr19-1-20000000_Ensembl.gtf file
 
Click [Open]
 
  
Zoom in until you start seeing reads.
+
Add the reads aligned using gene annotation data:
 +
* <code>Select File</code> -> <code>Load from File...</code>
 +
* Navigate to <code><yourhomedirectory></code> -> <code>i2rda_data/</code> -> 04_Visualisation_of_mapped_reads -> with_gtf
 +
* Select the <code>SRR769314_duplicates_marked.bam</code> and <code>SRR769316_duplicates_marked.bam</code> files
 +
* Click <code>Open</code>
  
1. Navigate to chr19:3715000-3718000 (note that you don't have to include commas in the base coordinates, as IGV will add these)
+
3. Navigate to <code>chr19:5748800-5751100</code>. Verify that the alignment looks better.
and identify reads spanning exon-exon junctions
 
2. Navigate to chr19:5748800-5751100. Zoom in to observe each end of this exon-exon junction.
 
What do you think of the alignment? How would you fix it?
 
  
Add the reads aligned using gene annotation data:
+
4. Navigate to <code>chr19:4709000-4756000</code>. Right click on the track names and select Collapsed.
Select File -> Load from File...
 
Navigate to home -> training -> Data/ -> 06_Visualisation_of_mapped_reads -> with_gtf
 
Select the SRR769314_duplicates_marked.bam and SRR769316_duplicates_marked.bam files
 
Click [Open]
 
  
3. Navigate to chr19:5748800-5751100. Verify that the alignment looks better.
+
<ins>Question</ins>:
How accurate do you think TopHat would be to detect novel (unannotated) junctions?
+
* What do you think of the difference in coverage between the SRR769314 and SRR769316 samples?
4. Navigate to chr19:4709000-4756000. Right click on the track names and select Collapsed.
 
What do you think of the difference in coverage between the SRR769314 and SRR769316 samples?
 
5. Navigate to chr19:6982100-6987800. Right click on the track names and select Sashimi Plot.
 
Can you identify which isoform is more expressed in each sample?
 
  
Page 2
+
5. Navigate to <code>chr19:6982100-6987800</code>. Right click on the track names and select Sashimi Plot.
  
+
<ins>Question</ins>:
 +
* Can you identify which isoform is more expressed in each sample?

Latest revision as of 20:35, 11 May 2017

Introduction

In contrast to the other more quantitative stages, this exercise is qualitative in the sense that we get a visual feel for a certain area of interest.

Aims

In this part you will learn to:

  • Use a genome browser, the Broad Institute's IGV, to visualise mapped reads

Igvlayout.png

Software to be used

To load these up:

module load samtools IGV

We'll be using the same data as before, but this time we will have two alignment files (i.e. two samples) from the same study. They are samples SRR769314 and SRR769316. The are tailored with respect to the time allocated for the workshop. They were aligned to the first 20 Mb of chromosome 19 of the mouse reference genome (GRCm38/mm10) using TopHat and duplicates have already been marked using Picard MarkDuplicates.

We shall will use the following files:

  • SRR769314_duplicates_marked.bam: aligned reads (without and with using gene annotation)
  • SRR769316_duplicates_marked.bam: aligned reads (without and with using gene annotation)
  • mm10_chr19-1-20000000.fasta: mouse reference genome sequence. Has has already been indexed.
  • mm10_chr19-1-20000000_Ensembl.gtf: Ensembl mouse gene models

Data location

The data is available in the directory 04_Visualisation_of_mapped_reads, i.e.

cd ~/i2rda_data/04_Visualisation_of_mapped_reads

Indexing BAM files

To enable fast access to any part of the BAM files we need to create an index using samtools:

samtools index SRR769314_duplicates_marked.bam
samtools index SRR769316_duplicates_marked.bam

Starting and using IGV

igv.sh &

To load the mouse genome

Select Genomes -> Load Genome from File...
Navigate to <yourhomedirectory> -> i2rda_data -> 04_Visualisation_of_mapped_reads -> Reference_files
Select the mm10_chr19-1-20000000.fasta file
Click Open

To load the alignments

  • Select File -> Load from File...
  • Navigate to <yourhomedirectory> -> i2rda_data -> 04_Visualisation_of_mapped_reads
  • Select both the SRR769314_duplicates_marked.bam and SRR769316_duplicates_marked.bam files by Ctrl+Click
  • Click Open

To load the Ensembl gene models

  • Select File -> Load from File...
  • Navigate to <yourhomedirectory> -> i2rda_data -> 04_Visualisation_of_mapped_reads -> Reference_files
  • Select the mm10_chr19-1-20000000_Ensembl.gtf file
  • Click Open

Zoom into target areas

Zoom in until you start seeing reads.

1. Navigate to chr19:3715000-3718000. Copy-paste for best accuracy. Note that IGV adds thousand separators to your location afterwards. 2. Navigate to chr19:5748800-5751100. Zoom in to observe each end of this exon-exon junction.

Question:

  • What do you think of the alignment? How would you fix it?

Add the reads aligned using gene annotation data:

  • Select File -> Load from File...
  • Navigate to <yourhomedirectory> -> i2rda_data/ -> 04_Visualisation_of_mapped_reads -> with_gtf
  • Select the SRR769314_duplicates_marked.bam and SRR769316_duplicates_marked.bam files
  • Click Open

3. Navigate to chr19:5748800-5751100. Verify that the alignment looks better.

4. Navigate to chr19:4709000-4756000. Right click on the track names and select Collapsed.

Question:

  • What do you think of the difference in coverage between the SRR769314 and SRR769316 samples?

5. Navigate to chr19:6982100-6987800. Right click on the track names and select Sashimi Plot.

Question:

  • Can you identify which isoform is more expressed in each sample?