Difference between revisions of "Functional Analysis Exercise"
(start) |
m |
||
(10 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
You will use the following tools, which have been pre-installed on <code>marvin</code> our bioinformatics training server at the University of St Andrews: | You will use the following tools, which have been pre-installed on <code>marvin</code> our bioinformatics training server at the University of St Andrews: | ||
− | *Gene Set Enrichment Analysis (<code>GSEA</code>): <code>http://www.broadinstitute.org/gsea/index.jsp</code>. | + | * Gene Set Enrichment Analysis (<code>GSEA</code>): <code>http://www.broadinstitute.org/gsea/index.jsp</code>. This loaded up with |
− | The | + | |
+ | module load gsea | ||
+ | |||
+ | The dataset you will investigate is from the study described in '''RNA-Seq Analyses Generate Comprehensive Transcriptomic Landscape and Reveal Complex Transcript Patterns in Hepatocellular Carcinoma Data''' by Huang et al. (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0026168). | ||
You will use the following files: | You will use the following files: | ||
Line 12: | Line 15: | ||
* <code>go_sets.gmt</code>: Gene Ontology gene sets | * <code>go_sets.gmt</code>: Gene Ontology gene sets | ||
− | + | = Data for analysis = | |
− | |||
− | + | Go to the directory <code>08_Functional_analysis</code> | |
− | Go to the directory | ||
− | cd ~/ | + | cd ~/i2rda_data/08_Functional_analysis/ |
Have a look at the file pv_glm.rnk: | Have a look at the file pv_glm.rnk: | ||
Line 24: | Line 25: | ||
less pv_glm.rnk | less pv_glm.rnk | ||
− | This file contains two tab-separated columns, that contain the name of the gene (Ensembl ID) and a numerical value of its differential expression (log FDR). The order of the genes doesn't matter, they will be ranked by GSEA based on their differential expression. | + | This file contains two tab-separated columns, that contain the name of the gene (<code>Ensembl ID</code>) and a numerical value of its differential expression (<code>log FDR</code>). The order of the genes doesn't matter, they will be ranked by GSEA based on their differential expression. |
− | Have a look at the file go_sets.gmt: | + | Have a look at the file <code>go_sets.gmt</code>: |
less go_sets.gmt | less go_sets.gmt | ||
− | This file contains three tab-separated columns, that contain the gene ontology (GO) term name, description, and all the genes that have been annotated with each term. How to create this file is described at the end of this | + | This file contains three tab-separated columns, that contain the gene ontology (GO) term name, description, and all the genes that have been annotated with each term. How to create this file is described at the end of this exercise. |
− | We used Gene Ontology (GO) annotation to create our | + | We used Gene Ontology (GO) annotation to create our "gene sets", but you can categorise the genes any way you think appropriate. |
= Gene Set Enrichment Analysis = | = Gene Set Enrichment Analysis = | ||
+ | |||
+ | == Running the program == | ||
+ | |||
Launch the GSEA GUI: | Launch the GSEA GUI: | ||
− | + | launchGSEA.sh | |
− | + | # under <code>Steps in GSEA analysis</code>. | |
− | + | # Click on <code>Method 1: Browse for files</code> | |
− | + | # Select the files <code>go_sets.gmt</code> and <code>pv_glm.rnk</code> (which are in the directory <code>~/i2rda_data/08_Functional_analysis</code>) and click <code>Open</code>. (This should give a pop-up message saying "Files loaded successfully: 2/2 There were NO errors"). | |
− | (This should give a pop-up message saying | + | # Select <code>Tools > GseaPreranked</code> from the top menu bar. |
− | + | # Select for the <code>Gene sets database</code>, the <code>...</code> button and the file <code>go_sets.gmt</code> (which is under the <code>Gene matrix (local gmx/gmt)</code> tab) and click <code>OK</code>. | |
− | + | # Change the <code>Number of permutations</code> to 100 (for demonstration purposes only). | |
− | + | # Select for the <code>Ranked list</code> the file pv_glm (this file should already be selected by default). | |
− | + | # Change <code>Collapse dataset to gene symbols</code> to false. | |
− | + | # Click on <code>>Run</code> at the bottom of the page. | |
− | + | # Under <code>GSEA reports</code> a <code>process</code> will appear with a status of "Running". | |
− | + | # You need to wait now as it runs its course. When the status of the process has changed to "Success" click on <code>Success</code>. This will open the GSEA Report for our dataset. | |
− | + | ||
+ | == Viewing the analysis == | ||
+ | |||
+ | The first section of the report shows the gene sets that are enriched among genes that are up-regulated in cancer compared to non-cancer (remember that we set non-cancer as the reference). | ||
− | |||
− | |||
The second section shows the gene sets that are enriched among genes that are down-regulated in cancer compared to non-cancer. | The second section shows the gene sets that are enriched among genes that are down-regulated in cancer compared to non-cancer. | ||
− | To view the detailed results, click on | + | |
− | Detailed documentation on how to interpret GSEA results can be found in the GSEA User Guide: http://www.broadinstitute.org/gsea/doc/ | + | To view the detailed results, click on <code>enrichment results in html format</code>. |
− | + | ||
− | genome-wide expression profiles | + | Detailed documentation on how to interpret GSEA results can be found in the GSEA User Guide: http://www.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html and the paper by Subramanian et al. '''Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles''', ''Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50''. |
+ | |||
Are there any genes up- or down-regulated that look like they could be involved in cancer? | Are there any genes up- or down-regulated that look like they could be involved in cancer? | ||
− | + | = Creating a <code>go_sets.gmt</code> file = | |
− | A go_sets.gmt file can be created by first downloading the GO information from Ensembl (http://www.ensembl.org): | + | |
+ | A <code>go_sets.gmt</code> file can be created by first downloading the GO information from Ensembl (<code>http://www.ensembl.org</code>): | ||
+ | |||
# Go to Ensembl Biomart: <code>http://www.ensembl.org/biomart/martview/</code> | # Go to Ensembl Biomart: <code>http://www.ensembl.org/biomart/martview/</code> | ||
# Select <code>Ensembl Genes</code> | # Select <code>Ensembl Genes</code> | ||
Line 73: | Line 81: | ||
An example of the text file you would download is the file <code>biomart_GO.txt</code>. | An example of the text file you would download is the file <code>biomart_GO.txt</code>. | ||
− | This <code>.txt</code> file can be converted into a <code>.gmt</code> file suitable for use in <code>GSEA</code> using the Perl script <code>makeGMT.pl</code>: | + | This <code>.txt</code> file can be converted into a <code>.gmt</code> file suitable for use in <code>GSEA</code> using the Perl script <code>makeGMT.pl</code> which is found in the <code>08_Functional_analysis</code> folder: |
cat biomart_GO.txt | perl makeGMT.pl > your_go_sets.gmt | cat biomart_GO.txt | perl makeGMT.pl > your_go_sets.gmt |
Latest revision as of 09:08, 11 May 2017
Contents
Aims
You will learn to:
- perform gene set enrichment analysis
You will use the following tools, which have been pre-installed on marvin
our bioinformatics training server at the University of St Andrews:
- Gene Set Enrichment Analysis (
GSEA
):http://www.broadinstitute.org/gsea/index.jsp
. This loaded up with
module load gsea
The dataset you will investigate is from the study described in RNA-Seq Analyses Generate Comprehensive Transcriptomic Landscape and Reveal Complex Transcript Patterns in Hepatocellular Carcinoma Data by Huang et al. (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0026168).
You will use the following files:
-
pv_glm.rnk
: gene list -
go_sets.gmt
: Gene Ontology gene sets
Data for analysis
Go to the directory 08_Functional_analysis
cd ~/i2rda_data/08_Functional_analysis/
Have a look at the file pv_glm.rnk:
less pv_glm.rnk
This file contains two tab-separated columns, that contain the name of the gene (Ensembl ID
) and a numerical value of its differential expression (log FDR
). The order of the genes doesn't matter, they will be ranked by GSEA based on their differential expression.
Have a look at the file go_sets.gmt
:
less go_sets.gmt
This file contains three tab-separated columns, that contain the gene ontology (GO) term name, description, and all the genes that have been annotated with each term. How to create this file is described at the end of this exercise.
We used Gene Ontology (GO) annotation to create our "gene sets", but you can categorise the genes any way you think appropriate.
Gene Set Enrichment Analysis
Running the program
Launch the GSEA GUI:
launchGSEA.sh
- under
Steps in GSEA analysis
. - Click on
Method 1: Browse for files
- Select the files
go_sets.gmt
andpv_glm.rnk
(which are in the directory~/i2rda_data/08_Functional_analysis
) and clickOpen
. (This should give a pop-up message saying "Files loaded successfully: 2/2 There were NO errors"). - Select
Tools > GseaPreranked
from the top menu bar. - Select for the
Gene sets database
, the...
button and the filego_sets.gmt
(which is under theGene matrix (local gmx/gmt)
tab) and clickOK
. - Change the
Number of permutations
to 100 (for demonstration purposes only). - Select for the
Ranked list
the file pv_glm (this file should already be selected by default). - Change
Collapse dataset to gene symbols
to false. - Click on
>Run
at the bottom of the page. - Under
GSEA reports
aprocess
will appear with a status of "Running". - You need to wait now as it runs its course. When the status of the process has changed to "Success" click on
Success
. This will open the GSEA Report for our dataset.
Viewing the analysis
The first section of the report shows the gene sets that are enriched among genes that are up-regulated in cancer compared to non-cancer (remember that we set non-cancer as the reference).
The second section shows the gene sets that are enriched among genes that are down-regulated in cancer compared to non-cancer.
To view the detailed results, click on enrichment results in html format
.
Detailed documentation on how to interpret GSEA results can be found in the GSEA User Guide: http://www.broadinstitute.org/gsea/doc/GSEAUserGuideFrame.html and the paper by Subramanian et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles, Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50.
Are there any genes up- or down-regulated that look like they could be involved in cancer?
Creating a go_sets.gmt
file
A go_sets.gmt
file can be created by first downloading the GO information from Ensembl (http://www.ensembl.org
):
- Go to Ensembl Biomart:
http://www.ensembl.org/biomart/martview/
- Select
Ensembl Genes
- Select your species of interest.
- Click on
Attributes
in the side menu. - Check
Ensembl Gene ID
from theGENE
section (other boxes should be unchecked). - Check
GO Term Name
andGO Term Accession
from theEXTERNAL
section (under sub sectionGO
). - Click on the
Results
button. - Click on the
Go
button (behindExport all results to file TSV
).
An example of the text file you would download is the file biomart_GO.txt
.
This .txt
file can be converted into a .gmt
file suitable for use in GSEA
using the Perl script makeGMT.pl
which is found in the 08_Functional_analysis
folder:
cat biomart_GO.txt | perl makeGMT.pl > your_go_sets.gmt