Difference between revisions of "Visualisation of mapped reads"
(3 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
In this part you will learn to: | In this part you will learn to: | ||
* Use a genome browser, the Broad Institute's IGV, to visualise mapped reads | * Use a genome browser, the Broad Institute's IGV, to visualise mapped reads | ||
+ | |||
+ | [[File:igvlayout.png]] | ||
= Software to be used = | = Software to be used = | ||
Line 15: | Line 17: | ||
module load samtools IGV | module load samtools IGV | ||
− | We'll be using the same data as before, but this time we will have two alignment files (i.e. two samples) from the same study. They are samples SRR769314 and SRR769316. The are tailored with respect to the time allocated for the workshop. They were aligned to the first 20 Mb of chromosome 19 of the mouse reference genome (GRCm38/mm10) using TopHat and duplicates have already been marked using Picard MarkDuplicates. | + | We'll be using the same data as before, but this time we will have two alignment files (i.e. two samples) from the same study. They are samples <code>SRR769314</code> and <code>SRR769316</code>. The are tailored with respect to the time allocated for the workshop. They were aligned to the first 20 Mb of chromosome 19 of the mouse reference genome (GRCm38/mm10) using TopHat and duplicates have already been marked using Picard MarkDuplicates. |
We shall will use the following files: | We shall will use the following files: | ||
* <code>SRR769314_duplicates_marked.bam</code>: aligned reads (without and with using gene annotation) | * <code>SRR769314_duplicates_marked.bam</code>: aligned reads (without and with using gene annotation) | ||
* <code>SRR769316_duplicates_marked.bam</code>: aligned reads (without and with using gene annotation) | * <code>SRR769316_duplicates_marked.bam</code>: aligned reads (without and with using gene annotation) | ||
− | * <code>mm10_chr19-1-20000000.fasta</code>: mouse reference genome sequence | + | * <code>mm10_chr19-1-20000000.fasta</code>: mouse reference genome sequence. Has has already been indexed. |
* <code>mm10_chr19-1-20000000_Ensembl.gtf</code>: Ensembl mouse gene models | * <code>mm10_chr19-1-20000000_Ensembl.gtf</code>: Ensembl mouse gene models | ||
− | Data | + | = Data location = |
− | The data is available in the directory 04_Visualisation_of_mapped_reads | + | The data is available in the directory <code>04_Visualisation_of_mapped_reads</code>, i.e. |
cd ~/i2rda_data/04_Visualisation_of_mapped_reads | cd ~/i2rda_data/04_Visualisation_of_mapped_reads | ||
Line 34: | Line 36: | ||
samtools index SRR769316_duplicates_marked.bam | samtools index SRR769316_duplicates_marked.bam | ||
− | = Starting IGV = | + | = Starting and using IGV = |
igv.sh & | igv.sh & | ||
− | To load the mouse genome | + | == To load the mouse genome == |
− | Select Genomes -> Load Genome from File... | + | Select <code>Genomes</code> -> <code>Load Genome from File...</code> |
− | Navigate to | + | Navigate to <code><yourhomedirectory></code> -> <code>i2rda_data</code> -> <code>04_Visualisation_of_mapped_reads</code> -> <code>Reference_files</code> |
− | Select the mm10_chr19-1-20000000.fasta file | + | Select the <code>mm10_chr19-1-20000000.fasta</code> file |
− | Click | + | Click <code>Open</code> |
− | To load the alignments | + | == To load the alignments == |
− | + | * Select <code>File</code> -> <code>Load</code> from File... | |
− | + | * Navigate to <code><yourhomedirectory</code>> -> <code>i2rda_data</code> -> <code>04_Visualisation_of_mapped_reads</code> | |
− | + | * Select both the <code>SRR769314_duplicates_marked.bam</code> and <code>SRR769316_duplicates_marked.bam</code> files by <code>Ctrl</code>+Click | |
− | + | * Click <code>Open</code> | |
− | To load the Ensembl gene models | + | == To load the Ensembl gene models == |
− | + | * <code>Select File</code> -> <code>Load from File...</code> | |
− | + | * Navigate to <code><yourhomedirectory></code> -> <code>i2rda_data</code> -> <code>04_Visualisation_of_mapped_reads</code> -> <code>Reference_files</code> | |
− | + | * Select the <code>mm10_chr19-1-20000000_Ensembl.gtf</code> file | |
− | + | * Click <code>Open</code> | |
+ | = Zoom into target areas = | ||
Zoom in until you start seeing reads. | Zoom in until you start seeing reads. | ||
− | 1. Navigate to <code>chr19:3715000-3718000</code> | + | 1. Navigate to <code>chr19:3715000-3718000</code>. Copy-paste for best accuracy. Note that IGV adds thousand separators to your location afterwards. |
− | 2. Navigate to chr19:5748800-5751100. Zoom in to observe each end of this exon-exon junction. | + | 2. Navigate to <code>chr19:5748800-5751100</code>. Zoom in to observe each end of this exon-exon junction. |
<ins>Question</ins>: | <ins>Question</ins>: | ||
Line 66: | Line 69: | ||
Add the reads aligned using gene annotation data: | Add the reads aligned using gene annotation data: | ||
− | + | * <code>Select File</code> -> <code>Load from File...</code> | |
− | + | * Navigate to <code><yourhomedirectory></code> -> <code>i2rda_data/</code> -> 04_Visualisation_of_mapped_reads -> with_gtf | |
− | + | * Select the <code>SRR769314_duplicates_marked.bam</code> and <code>SRR769316_duplicates_marked.bam</code> files | |
− | + | * Click <code>Open</code> | |
3. Navigate to <code>chr19:5748800-5751100</code>. Verify that the alignment looks better. | 3. Navigate to <code>chr19:5748800-5751100</code>. Verify that the alignment looks better. | ||
− | |||
− | |||
− | |||
4. Navigate to <code>chr19:4709000-4756000</code>. Right click on the track names and select Collapsed. | 4. Navigate to <code>chr19:4709000-4756000</code>. Right click on the track names and select Collapsed. |
Latest revision as of 20:35, 11 May 2017
Contents
Introduction
In contrast to the other more quantitative stages, this exercise is qualitative in the sense that we get a visual feel for a certain area of interest.
Aims
In this part you will learn to:
- Use a genome browser, the Broad Institute's IGV, to visualise mapped reads
Software to be used
- samtools v0.1.19: http://samtools.sourceforge.net/
- Integrated Genomics Viewer (IGV) 2.3.92: http://www.broadinstitute.org/igv/
To load these up:
module load samtools IGV
We'll be using the same data as before, but this time we will have two alignment files (i.e. two samples) from the same study. They are samples SRR769314
and SRR769316
. The are tailored with respect to the time allocated for the workshop. They were aligned to the first 20 Mb of chromosome 19 of the mouse reference genome (GRCm38/mm10) using TopHat and duplicates have already been marked using Picard MarkDuplicates.
We shall will use the following files:
-
SRR769314_duplicates_marked.bam
: aligned reads (without and with using gene annotation) -
SRR769316_duplicates_marked.bam
: aligned reads (without and with using gene annotation) -
mm10_chr19-1-20000000.fasta
: mouse reference genome sequence. Has has already been indexed. -
mm10_chr19-1-20000000_Ensembl.gtf
: Ensembl mouse gene models
Data location
The data is available in the directory 04_Visualisation_of_mapped_reads
, i.e.
cd ~/i2rda_data/04_Visualisation_of_mapped_reads
Indexing BAM files
To enable fast access to any part of the BAM files we need to create an index using samtools:
samtools index SRR769314_duplicates_marked.bam samtools index SRR769316_duplicates_marked.bam
Starting and using IGV
igv.sh &
To load the mouse genome
SelectGenomes
->Load Genome from File...
Navigate to<yourhomedirectory>
->i2rda_data
->04_Visualisation_of_mapped_reads
->Reference_files
Select themm10_chr19-1-20000000.fasta
file ClickOpen
To load the alignments
- Select
File
->Load
from File... - Navigate to
<yourhomedirectory
> ->i2rda_data
->04_Visualisation_of_mapped_reads
- Select both the
SRR769314_duplicates_marked.bam
andSRR769316_duplicates_marked.bam
files byCtrl
+Click - Click
Open
To load the Ensembl gene models
-
Select File
->Load from File...
- Navigate to
<yourhomedirectory>
->i2rda_data
->04_Visualisation_of_mapped_reads
->Reference_files
- Select the
mm10_chr19-1-20000000_Ensembl.gtf
file - Click
Open
Zoom into target areas
Zoom in until you start seeing reads.
1. Navigate to chr19:3715000-3718000
. Copy-paste for best accuracy. Note that IGV adds thousand separators to your location afterwards.
2. Navigate to chr19:5748800-5751100
. Zoom in to observe each end of this exon-exon junction.
Question:
- What do you think of the alignment? How would you fix it?
Add the reads aligned using gene annotation data:
-
Select File
->Load from File...
- Navigate to
<yourhomedirectory>
->i2rda_data/
-> 04_Visualisation_of_mapped_reads -> with_gtf - Select the
SRR769314_duplicates_marked.bam
andSRR769316_duplicates_marked.bam
files - Click
Open
3. Navigate to chr19:5748800-5751100
. Verify that the alignment looks better.
4. Navigate to chr19:4709000-4756000
. Right click on the track names and select Collapsed.
Question:
- What do you think of the difference in coverage between the SRR769314 and SRR769316 samples?
5. Navigate to chr19:6982100-6987800
. Right click on the track names and select Sashimi Plot.
Question:
- Can you identify which isoform is more expressed in each sample?