Difference between revisions of "Cluster Manual"

From wiki
Jump to: navigation, search
(Listing Available Software)
 
(3 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
This manual provides a brief introduction to the usage of St. Andrews’s bioinformatics cluster which consists of the frontend which we call marvin and its ten compute nodes called node1 to node10. The latest hardware/facility description for grant applications is available here.
 
This manual provides a brief introduction to the usage of St. Andrews’s bioinformatics cluster which consists of the frontend which we call marvin and its ten compute nodes called node1 to node10. The latest hardware/facility description for grant applications is available here.
  
==Login==
+
=Login=
 
The initial login, brings users into the cluster head node. From there, users can submit jobs to the compute nodes via the queue manager’s qsub command.
 
The initial login, brings users into the cluster head node. From there, users can submit jobs to the compute nodes via the queue manager’s qsub command.
  
===Login from Mac or Linux:===
+
==Login from Mac or Linux:==
  
 
Open the terminal and type:
 
Open the terminal and type:
 
  ssh -Y username@marvin.st-andrews.ac.uk
 
  ssh -Y username@marvin.st-andrews.ac.uk
  
===Login from Microsoft Windows:===
+
==Login from Microsoft Windows:==
  
You first need to have either of MobaXterm (link), putty (link) or UCSB’s SSH installed. MobaXterm provides a terminal where you can type:
+
You first need to have either of MobaXterm (link), putty (link), git bash ([https://git-for-windows.github.io/ via git for windows]) or UCSB’s SSH installed. MobaXterm provides a terminal where you can type:
 
  ssh username@marvin.st-andrews.ac.uk
 
  ssh username@marvin.st-andrews.ac.uk
  
===Password-less login via public key:===
+
==Password-less login via public key:==
  
 
If you generate a private/public key pair with ssh-keygen or similar, you can insert your public key on the server – and not have to keep keying in the password – via this command:
 
If you generate a private/public key pair with ssh-keygen or similar, you can insert your public key on the server – and not have to keep keying in the password – via this command:
 
  cat .ssh/id_rsa.pub | ssh username@marvin.st-andrews.ac.uk ‘cat >>.ssh/authorized_keys’
 
  cat .ssh/id_rsa.pub | ssh username@marvin.st-andrews.ac.uk ‘cat >>.ssh/authorized_keys’
  
==Copying files to and from cluster ==
+
= File and folder transfer to and from cluster =
  
=== Copy files from your local machine to marvin===
+
== Copy files from your local machine to marvin==
  
====Linux or Mac by command line (CLI)====
+
===Linux or Mac by command line (CLI)===
  
 
To copy a single file from your local machine to marvin, type:
 
To copy a single file from your local machine to marvin, type:
Line 34: Line 34:
 
Note that the tilde, “~”, represents yours the user’s home directory and is equivalent to “/storage/home/users/<username>/”, this being its absolute path.
 
Note that the tilde, “~”, represents yours the user’s home directory and is equivalent to “/storage/home/users/<username>/”, this being its absolute path.
  
Copy something from marvin to your machine
+
== Copy something FROM marvin TO your machine ==
 +
 
 
For this, the reverse operation of the above, due to firewall reasons, one must carry the copying operation on marvin itself. As an example, to copy a directory from marvin to your local computer, you enter marvin and type:
 
For this, the reverse operation of the above, due to firewall reasons, one must carry the copying operation on marvin itself. As an example, to copy a directory from marvin to your local computer, you enter marvin and type:
 +
 
  scp -rp <directoryname> <username_on_your_computer>@<IP_number_of_your_computer>:/</desired/path_on_your_computer>
 
  scp -rp <directoryname> <username_on_your_computer>@<IP_number_of_your_computer>:/</desired/path_on_your_computer>
  
Line 41: Line 43:
  
 
Type:
 
Type:
 +
 
  scp <your user name on marvin>@marvin.st-andrews.ac.uk:/storage/home/users/<your user name on marvin>/<anything that you want to copy> .
 
  scp <your user name on marvin>@marvin.st-andrews.ac.uk:/storage/home/users/<your user name on marvin>/<anything that you want to copy> .
  
    Linux
+
== Nautilus ==
  
 
You can map the cluster into your Nautilus directly
 
You can map the cluster into your Nautilus directly
Line 57: Line 60:
 
Note: this could be slightly different based on Nautilus version. In some versions you only need to write “ssh://marvin.st-andrews.ac.uk”
 
Note: this could be slightly different based on Nautilus version. In some versions you only need to write “ssh://marvin.st-andrews.ac.uk”
  
    Mac
+
=== Mac ===
  
 
Install Fugu and run it, OR
 
Install Fugu and run it, OR
 
install cyberduck and run it.
 
install cyberduck and run it.
  
    Microsoft Windows
+
=== Microsoft Windows ===
  
 
Install SSH Secure Shell or cyberduck
 
Install SSH Secure Shell or cyberduck
  
 
On the SSH Secure Shell open the and create a new connection ->
 
On the SSH Secure Shell open the and create a new connection ->
Screen
+
 
 +
= Screen =
  
 
Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells).
 
Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells).
You can open multiple screens and keep them open even if your connection drops.
 
  
    Creating a new screen to work
+
You can open multiple windows within a running screen, and detach them. A very useful aspect is that, unlike a normal command/line session, if the connection drops (maybe due to a network problem), screen is able to preserves your session. So, when you get reconnected, you can re-attach the screen.
  
Just type on marvin:
+
== Starting a new screen session ==
 +
 
 +
Simply type:
 
  screen
 
  screen
 +
 +
The reaction looks to be uneventful, as the screen program default is to give no clue that it is running.
  
 
Now you can perform the commands that you want. After that you can detach it to attach later.
 
Now you can perform the commands that you want. After that you can detach it to attach later.
Line 86: Line 93:
  
 
If your connection drops or you have detached from a screen, you can re-attach by just running:
 
If your connection drops or you have detached from a screen, you can re-attach by just running:
Just type:
+
 
screen -r
+
screen -r
  
 
However, if you have multiple screens type:
 
However, if you have multiple screens type:
Line 93: Line 100:
  
 
Hypothetical output
 
Hypothetical output
There are several suitable screens on:
+
There are several suitable screens on:
31917.pts-5.office (Detached)
+
31917.pts-5.office (Detached)
31844.pts-0.office (Detached)
+
31844.pts-0.office (Detached)
  
 
If you get this, just specify the screen that you want, type:
 
If you get this, just specify the screen that you want, type:
 
screen -r 31844.pts-0.office
 
screen -r 31844.pts-0.office
 
 
   
 
   
Modules
+
= Environmental Modules=
  
 
The modules system is a way to easily load software into your path. This approach has a number of advantages including allowing for multiple versions of the software to be installed at any given time.
 
The modules system is a way to easily load software into your path. This approach has a number of advantages including allowing for multiple versions of the software to be installed at any given time.
  
    Listing Available Software
+
== Listing Available Software ==
  
 
To list the available software run on terminal:
 
To list the available software run on terminal:
module avail
+
module avail
  
 
This should output something like:
 
This should output something like:
————————- /usr/local/Modules/versions ————————–
+
————————- /usr/local/Modules/versions ————————–
3.2.10
+
3.2.10
——————— /usr/local/Modules/3.2.9/modulefiles ———————
+
——————— /usr/local/Modules/3.2.9/modulefiles ———————
artemis/16.0.0(default) dot modules scripts
+
artemis/16.0.0(default) dot modules scripts
bedtools/2.17.0 EMBOSS/6.6.0(default) null seqtk/1.0-r57(default)
+
bedtools/2.17.0 EMBOSS/6.6.0(default) null seqtk/1.0-r57(default)
blastall/2.2.26(default) FASTQC/0.10.1(default) openmpi/1.6.5(default) stampy/1.0.23(default)
+
blastall/2.2.26(default) FASTQC/0.10.1(default) openmpi/1.6.5(default) stampy/1.0.23(default)
blastScripts/(default) gatk/3.2-2(default) paml/4.7a(default) tophat/2.0.10(default)
+
blastScripts/(default) gatk/3.2-2(default) paml/4.7a(default) tophat/2.0.10(default)
bowtie/1.0.0 general_script_tools/(default) picard-tools/1.118(default) trimmomatic/0.32
+
bowtie/1.0.0 general_script_tools/(default) picard-tools/1.118(default) trimmomatic/0.32
bowtie2/2.1.0(default) gwas python/2.7(default) use.own
+
bowtie2/2.1.0(default) gwas python/2.7(default) use.own
bwa/0.7.7(default) HTSlib/0.0.1(default) python/3.4 vcftools/0.1.12a(default)
+
bwa/0.7.7(default) HTSlib/0.0.1(default) python/3.4 vcftools/0.1.12a(default)
CEGMA/2.5(default) interproscan/5.4-47.0(default) R/2.15
+
CEGMA/2.5(default) interproscan/5.4-47.0(default) R/2.15
cufflinks/2.1.1 mafft/7.147(default) R/3.0
+
cufflinks/2.1.1 mafft/7.147(default) R/3.0
cufflinks/2.2.0(default) module-info samtools/0.1.19(default)
+
cufflinks/2.2.0(default) module-info samtools/0.1.19(default)
 +
 
 +
or the faster way:
 +
  moduleav
  
    Using The Software
+
== Using The Software ==
  
 
To load a module into your path, run:
 
To load a module into your path, run:
module load <software>[<version>]
+
module load <software>[<version>]
  
 
You only need to add the version if you want a different version than default. So, if you wanted to load tophat default version, you would run:
 
You only need to add the version if you want a different version than default. So, if you wanted to load tophat default version, you would run:
module load cufflinks
+
module load cufflinks
  
 
If you wanted specifically the version 2.1.1, you would run:
 
If you wanted specifically the version 2.1.1, you would run:
module load cufflinks/2.1.1
+
module load cufflinks/2.1.1
  
+
== Showing What Software is Loaded ==
 
 
    Showing What Software is Loaded
 
  
 
To show what modules you have loaded at any time, you can run:
 
To show what modules you have loaded at any time, you can run:
module list
+
module list
  
 
Depending on what modules you have loaded, it will produce something like this:
 
Depending on what modules you have loaded, it will produce something like this:
  
Currently Loaded Modulefiles:
+
Currently Loaded Modulefiles:
 +
1) modules 3) R/3.0 5) blastScripts/(default)
 +
2) python/2.7 4) blastall/2.2.26 6) cufflinks/2.1.1
  
Currently Loaded Modulefiles:
+
== Unloading Software ==
1) modules 3) R/3.0 5) blastScripts/(default)
 
2) python/2.7 4) blastall/2.2.26 6) cufflinks/2.1.1
 
 
 
    Unloading Software
 
  
 
Sometimes you want to no longer have a piece of software in path. To do this you unload the module by running:
 
Sometimes you want to no longer have a piece of software in path. To do this you unload the module by running:
module unload <software>[<version>]
+
module unload <software>[<version>]
  
 
Show how to use a specific Software
 
Show how to use a specific Software
module help stampy
+
module help stampy
  
    Additional Features
+
== Additional Features ==
  
 
There are additional features and operations that can be done with the module command.
 
There are additional features and operations that can be done with the module command.
 
Please run the following to get more information:
 
Please run the following to get more information:
module help
+
module help
  
 
   
 
   
Python
+
= Python =
  
 
There are several versions of python, though it’s “python2.7” which has biopython installed and it’s loaded as a module by default, ready for you to use. Leaving out the “2.7” will give you the wrong python version. So, to run scripts you need to type:
 
There are several versions of python, though it’s “python2.7” which has biopython installed and it’s loaded as a module by default, ready for you to use. Leaving out the “2.7” will give you the wrong python version. So, to run scripts you need to type:
python2.7 <your script>
+
python2.7 <your script>
  
 
Or, if you want your script to function as an executable, you add a shebang on the top line like so:
 
Or, if you want your script to function as an executable, you add a shebang on the top line like so:
#!/usr/bin/env python2.7
+
#!/usr/bin/env python2.7
  
 
And them make it executable with “chmod 755”
 
And them make it executable with “chmod 755”
R
+
 
 +
= R =
  
 
To open a R console you need to type:
 
To open a R console you need to type:
R
+
R
  
Note: by default the R 3.2.1 is load into your path.
+
Note: by default the R 3.2.1 module is already loaded into your path.
  
 
R Libraries already installed in R 3.2.1
 
R Libraries already installed in R 3.2.1
-cvTools
+
-cvTools
-biocLite core
+
-biocLite core
-ggplot2
+
-ggplot2
-PopGenome
+
-PopGenome
-cn.mops
+
-cn.mops
-MCMCglmm
+
-MCMCglmm
-boot
+
-boot
-R2Cuba
+
-R2Cuba
-mvtnorm
+
-mvtnorm
-glmnet
+
-glmnet
-mgcv
+
-mgcv
-gsg
+
-gsg
-numDeriv
+
-numDeriv
-nlme
+
-nlme
-qtl (RQTL)
+
-qtl (RQTL)
-onemap
+
-onemap
-limma
+
-limma
-edgeR
+
-edgeR
-diveRsity
+
-diveRsity
  
 
R Libraries already installed in R 3.0.2
 
R Libraries already installed in R 3.0.2
-biocLite core
+
-biocLite core
-cvTools
+
-cvTools
-ggplot2
+
-ggplot2
-PopGenome
+
-PopGenome
-cn.mops
+
-cn.mops
-MCMCglmm
+
-MCMCglmm
-boot
+
-boot
-R2Cuba
+
-R2Cuba
-mvtnorm
+
-mvtnorm
-glmnet
+
-glmnet
-mgcv
+
-mgcv
-gsg
+
-gsg
-numDeriv
+
-numDeriv
-nlme
+
-nlme
-methylkit
+
-methylkit
-qtl (RQTL)
+
-qtl (RQTL)
-onemap
+
-onemap
-cummeRbund
+
-cummeRbund
-limma
+
-limma
-edgeR
+
-edgeR
  
 
Note: other libraries can be installed at your request.
 
Note: other libraries can be installed at your request.
Launching Jobs on the Queue Manager (SGE)
+
 
 +
= Launching Jobs on the Queue Manager (SGE) =
  
 
The cluster is a shared resource, analogous to a road network which by turns sees high and low traffic. Submitting and managing jobs via scripts is at the heart of using the cluster. Software is run on the nodes in the cluster by including the command and all its options and argument in a jobscript.
 
The cluster is a shared resource, analogous to a road network which by turns sees high and low traffic. Submitting and managing jobs via scripts is at the heart of using the cluster. Software is run on the nodes in the cluster by including the command and all its options and argument in a jobscript.
 
Note: Please do not run ANY computationally intensive tasks on the head node. If this is done, we will have to kill your jobs, because they will slow down all other users.
 
Note: Please do not run ANY computationally intensive tasks on the head node. If this is done, we will have to kill your jobs, because they will slow down all other users.
  
    Usage Guidelines
+
== Usage Guidelines ==
  
 
There are a number of different queues available to cluster users. Below is a table of the resource limitations associated with each:
 
There are a number of different queues available to cluster users. Below is a table of the resource limitations associated with each:
  
all.q – This is the default queue. You can use it if your job don’t have any special requirement.
+
* all.q – This is the default queue. You can use it if your job don’t have any special requirement.
lowmemory.q – This queue is for jobs that less than 64GB of RAM.
+
* lowmemory.q – This queue is for jobs that less than 64GB of RAM.
highmemory.q – This queue is for jobs that require more than 64GB RAM.
+
* highmemory.q – This queue is for jobs that require more than 64GB RAM.
blast.q – This queue is for blast jobs.
+
* blast.q – This queue is for blast jobs.*
marvin.q – This queue is to submit jobs only on marvin.
+
* marvin.q – This queue is to submit jobs only on marvin.
  
    Submitting Jobs using a Script
+
== Submitting Jobs using a Script ==
  
 
A script is just a set of commands that we want to make happen once the job runs. Below is an example of simple script. You can do what you want there.
 
A script is just a set of commands that we want to make happen once the job runs. Below is an example of simple script. You can do what you want there.
 
You need to create this script and save it under a convenient and memorable name, such as “hnamejobscript.sh” (this name already tell us it’s a shell script for launching a -very simple – hostname job on the cluster.
 
You need to create this script and save it under a convenient and memorable name, such as “hnamejobscript.sh” (this name already tell us it’s a shell script for launching a -very simple – hostname job on the cluster.
  
#!/bin/bash
+
#!/bin/bash
#$ -V ## pass all environment variables to the job, VERY IMPORTANT
+
#$ -V ## pass all environment variables to the job, VERY IMPORTANT
#$ -N run_something ## job name
+
#$ -N run_something ## job name
#$ -S /bin/bash ## shell where it will run this job
+
#$ -S /bin/bash ## shell where it will run this job
#$ -j y ## join error output to normal output
+
#$ -j y ## join error output to normal output
#$ -cwd ## Execute the job from the current working directory
+
#$ -cwd ## Execute the job from the current working directory
#$ -q lowmemory.q ## queue name
+
#$ -q lowmemory.q ## queue name
uptime > myUptime.${JOB_ID}.txt
+
uptime > myUptime.${JOB_ID}.txt
echo $HOSTNAME >> myUptime.${JOB_ID}.txt
+
echo $HOSTNAME >> myUptime.${JOB_ID}.txt
  
 
To then proceed to have it run, we invoke the “qsub” command like so:
 
To then proceed to have it run, we invoke the “qsub” command like so:
qsub hnamejobscript.sh
+
qsub hnamejobscript.sh
  
 
   
 
   
  
    Checking the jobs that are running
+
== Checking the jobs that are running ==
 
 
type:
 
qstat
 
  
   
+
  qstat
  
    Deleting jobs
+
== Deleting jobs ==
  
 
To remove only one job type:
 
To remove only one job type:
qdel <job number>
+
qdel <job number>
  
 
To remove all your jobs type:
 
To remove all your jobs type:
qdel -u <your user name on marvin>
+
qdel -u <your user name on marvin>
  
 
More information Open Grid Engine aka Sun Grid Engine aka Oracle Grid Engine
 
More information Open Grid Engine aka Sun Grid Engine aka Oracle Grid Engine
Blast
+
 
 +
=Blast=
  
 
Load the module first:
 
Load the module first:
module load blastScripts
+
module load blastScripts
  
 
Now you are able to make blasts easly.
 
Now you are able to make blasts easly.
 
The databases available are: nr@ncbi; nt@ncbi; human_G38.fasta and human_genomic, otherwise you need to give the complete path to your database.
 
The databases available are: nr@ncbi; nt@ncbi; human_G38.fasta and human_genomic, otherwise you need to give the complete path to your database.
blastSGE.py mySequences.fna blastn nt 1e-30 30 10 N
+
blastSGE.py mySequences.fna blastn nt 1e-30 30 10 N
  
 
If you type only:
 
If you type only:
blastSGE.py
+
blastSGE.py
  
 
and you get this explanation:
 
and you get this explanation:
blastSGE <file to process> <result.xml> <blast program> <database> <E-Value> <Max matches in a query range = 0> <limit hit number> <clean directories? Y|N> <translate table (optional)>
+
blastSGE <file to process> <result.xml> <blast program> <database> <E-Value> <Max matches in a query range = 0> <limit hit number> <clean directories? Y|N> <translate table (optional)>
 
use (only aggregate the xml files into one): blastSGE <file to process> <result.xml>
 
use (only aggregate the xml files into one): blastSGE <file to process> <result.xml>
use (only aggregate the xml files into one): blastSGE <path> <result.xml>
 
  
– <file to process> ## input file
+
– <file to process> ## input file
– <result.xml> ## output file
+
– <result.xml> ## output file
– <blast program> ## blast program (blastn|blastp|blastx)
+
– <blast program> ## blast program (blastn|blastp|blastx)
– <database> ## database name (nr|nt|human_G38.fasta|human_genomic) otherwise you need to give the complete path to your database.
+
– <database> ## database name (nr|nt|human_G38.fasta|human_genomic) otherwise you need to give the complete path to your database.
– <E-Value>; ## E-Value limit
+
– <E-Value>; ## E-Value limit
– <Max matches in a query range = 0> ## max number of match
+
– <Max matches in a query range = 0> ## max number of match
– <limit hit number> ## max number os hits
+
– <limit hit number> ## max number os hits
– <clean directories? Y|N> ## in cluster mode the input file is divided in several files and each file is runned in one node. These directories have the results off that. If something goes wrong in the middle you can start the job in the point where it broke. You can remove them at the end when everything was done.
+
– <clean directories? Y|N> ## in cluster mode the input file is divided in several files and each file is runned in one node. These directories have the results off that. If something goes wrong in the middle you can start the job in the point where it broke. You can remove them at the end when everything was done.
– <translate table (optional)> ## codon translation table, not compulsory
+
– <translate table (optional)> ## codon translation table, not compulsory
The Wellcome Trust
 

Latest revision as of 10:07, 29 March 2018

Introduction

This manual provides a brief introduction to the usage of St. Andrews’s bioinformatics cluster which consists of the frontend which we call marvin and its ten compute nodes called node1 to node10. The latest hardware/facility description for grant applications is available here.

Login

The initial login, brings users into the cluster head node. From there, users can submit jobs to the compute nodes via the queue manager’s qsub command.

Login from Mac or Linux:

Open the terminal and type:

ssh -Y username@marvin.st-andrews.ac.uk

Login from Microsoft Windows:

You first need to have either of MobaXterm (link), putty (link), git bash (via git for windows) or UCSB’s SSH installed. MobaXterm provides a terminal where you can type:

ssh username@marvin.st-andrews.ac.uk

Password-less login via public key:

If you generate a private/public key pair with ssh-keygen or similar, you can insert your public key on the server – and not have to keep keying in the password – via this command:

cat .ssh/id_rsa.pub | ssh username@marvin.st-andrews.ac.uk ‘cat >>.ssh/authorized_keys’

File and folder transfer to and from cluster

Copy files from your local machine to marvin

Linux or Mac by command line (CLI)

To copy a single file from your local machine to marvin, type:

scp <filename> <username>@marvin.st-andrews.ac.uk:~

To copy an entire directory, you need to add the -r switch to the scp command above, and to preserve file dates and ownership, you also add the -p switch. Note that the tilde, “~”, represents yours the user’s home directory and is equivalent to “/storage/home/users/<username>/”, this being its absolute path.

Copy something FROM marvin TO your machine

For this, the reverse operation of the above, due to firewall reasons, one must carry the copying operation on marvin itself. As an example, to copy a directory from marvin to your local computer, you enter marvin and type:

scp -rp <directoryname> <username_on_your_computer>@<IP_number_of_your_computer>:/</desired/path_on_your_computer>

Copy something from marvin to your machine but the commands are typed in your machine

Type:

scp <your user name on marvin>@marvin.st-andrews.ac.uk:/storage/home/users/<your user name on marvin>/<anything that you want to copy> .

Nautilus

You can map the cluster into your Nautilus directly

Go to “File->Connect to Server” and then choose SSH in “Service type”

Server: marvin.st-andrews.ac.uk
Port: 22
User name: <your user name on marvin>
Password: <your password on marvin>

If everything worked fine you have a new entry in your nautilus that you can use to pass files to there.

Note: this could be slightly different based on Nautilus version. In some versions you only need to write “ssh://marvin.st-andrews.ac.uk”

Mac

Install Fugu and run it, OR install cyberduck and run it.

Microsoft Windows

Install SSH Secure Shell or cyberduck

On the SSH Secure Shell open the and create a new connection ->

Screen

Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells).

You can open multiple windows within a running screen, and detach them. A very useful aspect is that, unlike a normal command/line session, if the connection drops (maybe due to a network problem), screen is able to preserves your session. So, when you get reconnected, you can re-attach the screen.

Starting a new screen session

Simply type:

screen

The reaction looks to be uneventful, as the screen program default is to give no clue that it is running.

Now you can perform the commands that you want. After that you can detach it to attach later.

   Detaching From Screen

Press: “Ctrl-a” “d”

   Reattach to Screen

If your connection drops or you have detached from a screen, you can re-attach by just running:

screen -r

However, if you have multiple screens type: screen -r

Hypothetical output

There are several suitable screens on:
31917.pts-5.office (Detached)
31844.pts-0.office (Detached)

If you get this, just specify the screen that you want, type: screen -r 31844.pts-0.office

Environmental Modules

The modules system is a way to easily load software into your path. This approach has a number of advantages including allowing for multiple versions of the software to be installed at any given time.

Listing Available Software

To list the available software run on terminal:

module avail

This should output something like:

————————- /usr/local/Modules/versions ————————–
3.2.10
——————— /usr/local/Modules/3.2.9/modulefiles ———————
artemis/16.0.0(default) dot modules scripts
bedtools/2.17.0 EMBOSS/6.6.0(default) null seqtk/1.0-r57(default)
blastall/2.2.26(default) FASTQC/0.10.1(default) openmpi/1.6.5(default) stampy/1.0.23(default)
blastScripts/(default) gatk/3.2-2(default) paml/4.7a(default) tophat/2.0.10(default)
bowtie/1.0.0 general_script_tools/(default) picard-tools/1.118(default) trimmomatic/0.32
bowtie2/2.1.0(default) gwas python/2.7(default) use.own
bwa/0.7.7(default) HTSlib/0.0.1(default) python/3.4 vcftools/0.1.12a(default)
CEGMA/2.5(default) interproscan/5.4-47.0(default) R/2.15
cufflinks/2.1.1 mafft/7.147(default) R/3.0
cufflinks/2.2.0(default) module-info samtools/0.1.19(default)

or the faster way:

 moduleav

Using The Software

To load a module into your path, run:

module load <software>[<version>]

You only need to add the version if you want a different version than default. So, if you wanted to load tophat default version, you would run:

module load cufflinks

If you wanted specifically the version 2.1.1, you would run:

module load cufflinks/2.1.1

Showing What Software is Loaded

To show what modules you have loaded at any time, you can run:

module list

Depending on what modules you have loaded, it will produce something like this:

Currently Loaded Modulefiles:
1) modules 3) R/3.0 5) blastScripts/(default)
2) python/2.7 4) blastall/2.2.26 6) cufflinks/2.1.1

Unloading Software

Sometimes you want to no longer have a piece of software in path. To do this you unload the module by running:

module unload <software>[<version>]

Show how to use a specific Software

module help stampy

Additional Features

There are additional features and operations that can be done with the module command. Please run the following to get more information:

module help


Python

There are several versions of python, though it’s “python2.7” which has biopython installed and it’s loaded as a module by default, ready for you to use. Leaving out the “2.7” will give you the wrong python version. So, to run scripts you need to type:

python2.7 <your script>

Or, if you want your script to function as an executable, you add a shebang on the top line like so:

#!/usr/bin/env python2.7

And them make it executable with “chmod 755”

R

To open a R console you need to type:

R

Note: by default the R 3.2.1 module is already loaded into your path.

R Libraries already installed in R 3.2.1

-cvTools
-biocLite core
-ggplot2
-PopGenome
-cn.mops
-MCMCglmm
-boot
-R2Cuba
-mvtnorm
-glmnet
-mgcv
-gsg
-numDeriv
-nlme
-qtl (RQTL)
-onemap
-limma
-edgeR
-diveRsity

R Libraries already installed in R 3.0.2

-biocLite core
-cvTools
-ggplot2
-PopGenome
-cn.mops
-MCMCglmm
-boot
-R2Cuba
-mvtnorm
-glmnet
-mgcv
-gsg
-numDeriv
-nlme
-methylkit
-qtl (RQTL)
-onemap
-cummeRbund
-limma
-edgeR

Note: other libraries can be installed at your request.

Launching Jobs on the Queue Manager (SGE)

The cluster is a shared resource, analogous to a road network which by turns sees high and low traffic. Submitting and managing jobs via scripts is at the heart of using the cluster. Software is run on the nodes in the cluster by including the command and all its options and argument in a jobscript. Note: Please do not run ANY computationally intensive tasks on the head node. If this is done, we will have to kill your jobs, because they will slow down all other users.

Usage Guidelines

There are a number of different queues available to cluster users. Below is a table of the resource limitations associated with each:

  • all.q – This is the default queue. You can use it if your job don’t have any special requirement.
  • lowmemory.q – This queue is for jobs that less than 64GB of RAM.
  • highmemory.q – This queue is for jobs that require more than 64GB RAM.
  • blast.q – This queue is for blast jobs.*
  • marvin.q – This queue is to submit jobs only on marvin.

Submitting Jobs using a Script

A script is just a set of commands that we want to make happen once the job runs. Below is an example of simple script. You can do what you want there. You need to create this script and save it under a convenient and memorable name, such as “hnamejobscript.sh” (this name already tell us it’s a shell script for launching a -very simple – hostname job on the cluster.

#!/bin/bash
#$ -V ## pass all environment variables to the job, VERY IMPORTANT
#$ -N run_something ## job name
#$ -S /bin/bash ## shell where it will run this job
#$ -j y ## join error output to normal output
#$ -cwd ## Execute the job from the current working directory
#$ -q lowmemory.q ## queue name
uptime > myUptime.${JOB_ID}.txt
echo $HOSTNAME >> myUptime.${JOB_ID}.txt

To then proceed to have it run, we invoke the “qsub” command like so:

qsub hnamejobscript.sh


Checking the jobs that are running

qstat

Deleting jobs

To remove only one job type:

qdel <job number>

To remove all your jobs type:

qdel -u <your user name on marvin>

More information Open Grid Engine aka Sun Grid Engine aka Oracle Grid Engine

Blast

Load the module first:

module load blastScripts

Now you are able to make blasts easly. The databases available are: nr@ncbi; nt@ncbi; human_G38.fasta and human_genomic, otherwise you need to give the complete path to your database.

blastSGE.py mySequences.fna blastn nt 1e-30 30 10 N

If you type only:

blastSGE.py

and you get this explanation:

blastSGE <file to process> <result.xml> <blast program> <database> <E-Value> <Max matches in a query range = 0> <limit hit number> <clean directories? Y|N> <translate table (optional)>

use (only aggregate the xml files into one): blastSGE <file to process> <result.xml>

– <file to process> ## input file
– <result.xml> ## output file
– <blast program> ## blast program (blastn|blastp|blastx)
– <database> ## database name (nr|nt|human_G38.fasta|human_genomic) otherwise you need to give the complete path to your database.
– <E-Value>; ## E-Value limit
– <Max matches in a query range = 0> ## max number of match
– <limit hit number> ## max number os hits
– <clean directories? Y|N> ## in cluster mode the input file is divided in several files and each file is runned in one node. These directories have the results off that. If something goes  wrong in the middle you can start the job in the point where it broke. You can remove them at the end when everything was done.
– <translate table (optional)> ## codon translation table, not compulsory