Difference between revisions of "Functional Analysis Exercise"

From wiki
Jump to: navigation, search
Line 5: Line 5:
  
 
You will use the following tools, which have been pre-installed on <code>marvin</code> our bioinformatics training server at the University of St Andrews:
 
You will use the following tools, which have been pre-installed on <code>marvin</code> our bioinformatics training server at the University of St Andrews:
*Gene Set Enrichment Analysis (<code>GSEA</code>): <code>http://www.broadinstitute.org/gsea/index.jsp</code>.
+
* Gene Set Enrichment Analysis (<code>GSEA</code>): <code>http://www.broadinstitute.org/gsea/index.jsp</code>. This loaded up with
The data set you will investigate is from the study described in '''RNA-Seq Analyses Generate Comprehensive Transcriptomic Landscape and Reveal Complex Transcript Patterns in Hepatocellular Carcinoma Data''' by Huang et al. (<code>http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0026168</code>).
+
 
 +
module load gsea
 +
 
 +
The dataset you will investigate is from the study described in '''RNA-Seq Analyses Generate Comprehensive Transcriptomic Landscape and Reveal Complex Transcript Patterns in Hepatocellular Carcinoma Data''' by Huang et al. (<code>http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0026168</code>).
  
 
You will use the following files:
 
You will use the following files:
 
* <code>pv_glm.rnk</code>: gene list
 
* <code>pv_glm.rnk</code>: gene list
 
* <code>go_sets.gmt</code>: Gene Ontology gene sets
 
* <code>go_sets.gmt</code>: Gene Ontology gene sets
 
Type text like this in the terminal at the $ command prompt, then press the
 
[Enter] key to run the command.
 
  
 
= Data for analysis =
 
= Data for analysis =
Line 32: Line 32:
 
This file contains three tab-separated columns, that contain the gene ontology (GO) term name, description, and all the genes that have been annotated with each term. How to create this file is described at the end of this exercise.
 
This file contains three tab-separated columns, that contain the gene ontology (GO) term name, description, and all the genes that have been annotated with each term. How to create this file is described at the end of this exercise.
  
We used Gene Ontology (GO) annotation to create our ''gene sets'', but you can categorise the genes any way you think appropriate.
+
We used Gene Ontology (GO) annotation to create our "gene sets", but you can categorise the genes any way you think appropriate.
  
 
= Gene Set Enrichment Analysis =
 
= Gene Set Enrichment Analysis =
 +
 
Launch the GSEA GUI:
 
Launch the GSEA GUI:
  
    java –jar gsea.jar
+
launchGSEA.sh
  
 
# Click on 'Load data' under 'Steps in GSEA analysis'.
 
# Click on 'Load data' under 'Steps in GSEA analysis'.

Revision as of 08:39, 11 May 2017

Aims

You will learn to:

  • perform gene set enrichment analysis

You will use the following tools, which have been pre-installed on marvin our bioinformatics training server at the University of St Andrews:

module load gsea

The dataset you will investigate is from the study described in RNA-Seq Analyses Generate Comprehensive Transcriptomic Landscape and Reveal Complex Transcript Patterns in Hepatocellular Carcinoma Data by Huang et al. (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0026168).

You will use the following files:

  • pv_glm.rnk: gene list
  • go_sets.gmt: Gene Ontology gene sets

Data for analysis

Go to the directory 10_Functional_analysis

cd ~/i2rda_data/08_Functional_analysis/

Have a look at the file pv_glm.rnk:

less pv_glm.rnk

This file contains two tab-separated columns, that contain the name of the gene (Ensembl ID) and a numerical value of its differential expression (log FDR). The order of the genes doesn't matter, they will be ranked by GSEA based on their differential expression.

Have a look at the file go_sets.gmt:

less go_sets.gmt

This file contains three tab-separated columns, that contain the gene ontology (GO) term name, description, and all the genes that have been annotated with each term. How to create this file is described at the end of this exercise.

We used Gene Ontology (GO) annotation to create our "gene sets", but you can categorise the genes any way you think appropriate.

Gene Set Enrichment Analysis

Launch the GSEA GUI:

launchGSEA.sh
  1. Click on 'Load data' under 'Steps in GSEA analysis'.
  2. Click on 'Method 1: Browse for files ...'.
  3. Select the files go_sets.gmt and pv_glm.rnk (which are in the directory /home/training/Data/10_Functional_analysis) and click [Open]. (This should give a pop-up message saying 'Files loaded successfully: 2/2 There were NO errors').
  4. Select 'Tools > GseaPreranked' from the top menu bar.
  5. Select for the 'Gene sets database' the file go_sets.gmt (which is under the 'Gene matrix (local gmx/gmt)' tab) and click [OK].
  6. Change the 'Number of permutations' to 100 (for demonstration purposes only).
  7. Select for the 'Ranked list' the file pv_glm (this file should already be selected by default).
  8. Change 'Collapse dataset to gene symbols' to false.
  9. Click on '>Run' at the bottom of the page.
  10. Under 'GSEA reports' a 'process' will appear with a status of “Running”.
  11. When the status of the process has changed to “Success” click on “Success”. This will open the GSEA Report for our dataset.

The first section of the report shows the gene sets that are enriched among genes that are up-regulated in cancer compared to non-cancer (remember that we set non-cancer as the reference).

The second section shows the gene sets that are enriched among genes that are down-regulated in cancer compared to non-cancer.

To view the detailed results, click on enrichment results in html format.

Detailed documentation on how to interpret GSEA results can be found in the GSEA User Guide: http://www.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html and the paper by Subramanian et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles, Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50.

Are there any genes up- or down-regulated that look like they could be involved in cancer?

Creating a go_sets.gmt file

A go_sets.gmt file can be created by first downloading the GO information from Ensembl (http://www.ensembl.org):

  1. Go to Ensembl Biomart: http://www.ensembl.org/biomart/martview/
  2. Select Ensembl Genes
  3. Select your species of interest.
  4. Click on Attributes in the side menu.
  5. Check Ensembl Gene ID from the GENE section (other boxes should be unchecked).
  6. Check GO Term Name and GO Term Accession from the EXTERNAL section (under sub section GO).
  7. Click on the Results button.
  8. Click on the Go button (behind Export all results to file TSV).

An example of the text file you would download is the file biomart_GO.txt. This .txt file can be converted into a .gmt file suitable for use in GSEA using the Perl script makeGMT.pl:

cat biomart_GO.txt | perl makeGMT.pl > your_go_sets.gmt