Difference between revisions of "Intro to RNA-Seq Data Analysis Course"

From wiki
Jump to: navigation, search
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= Course schedule =
 
= Course schedule =
  
* This is based on a 2 day Edinburgh Genomics course of the same name.
+
* This is based on a 2 day Edinburgh Genomics course of the same name, with the following changes:
* Intro to Linux, Sequencer visits and technology iparts taken out. Also, less theory.
+
:- "Introduction to Linux" moduel excluded
* Maximise practical aspect.
+
:- "Sequencer technology overview" module excluded.
* Having said that, if you fall behind, listening is better than catching up
+
:- No laboratory visit
 +
:- 50% of that course was theoretical, this will be reduced to 30%
  
Course website: <code>http://stab.st-andrews.ac.uk/i2rda/</code>
+
* Each section begins with a "Talk", and then a practical runthrough.
 +
* If necessary, some talk slides may be skipped, as the main idea is getting through the practicals.
 +
* Having said that if major theoretical points arise during a practical, they will be discussed.
 +
* Course website: <code>http://stab.st-andrews.ac.uk/i2rda/</code>
 +
:- this has all the presentations and practicals
  
 
= Connecting to a remote Machine =
 
= Connecting to a remote Machine =
Line 26: Line 31:
  
 
You should now be able to "open" a session
 
You should now be able to "open" a session
* Be aware typng in your password is done blindly. I.e. it does not appear on the screen.
+
* Be aware: typing in your password is done blindly. I.e. it does not appear on the screen.
  
= Overview of RNA-Seq =
+
Note: If you don't have your password, please ask to have it reset for you.
  
* For gene expression analyses, seen as a more powerful replacememnt to microarrays
+
= Computing resources notes =
  
= Computing resources =
+
* RNA-Seq like other Next Generation Sequencing technologies, is characterised by;
 +
:- heavy computational workloads
 +
:- many different software programs, sometimes doing the same thing, which can be arranged into a pipeline.
 +
:- long-running tasks.
  
* RNA-Seq is a heavy workload, we need to be prepared for long-running tasks. This has two implications
+
These have three implications:
 
:- The marvin cluster is an 11-machine '''shared''' computing resource, not a personal computer ... others are using it.
 
:- The marvin cluster is an 11-machine '''shared''' computing resource, not a personal computer ... others are using it.
 +
:- We need to load the special software before using it
 
:- We want to be able to have a process run unattended.
 
:- We want to be able to have a process run unattended.
* For these two aspects, we have:
 
* A queue system to use, we shall request an interactive session ('''<code>qrsh</code>''') from the queue.
 
* We shall use the '''GNU Screen''' utility so we can do other things while waiting.
 
  
==GNU Screen ==
+
* For these three aspects, we have:
 +
:- A queue system to use, we shall request an interactive session ('''<code>qrsh</code>''') from the queue.
 +
:- Use the ''module'' system to load, list and unload software programs
 +
:- We shall use the '''GNU Screen''' utility so we can do other things while waiting.
  
Simply allows several command-line sessions to which you switch back and forth.
+
= Computing resources diagram =
To enter a new session:
+
 
screen
+
[[File:marv.png]]
* This will open with quite bare screen except with a bottom line
+
 
* There will be two command-line windows open.
+
= Aspects of using Windows terminals to connect to Linux =
* Screen works on the activator key concept, you need ot use <code>Ctrl+l</code> to activate any of its functions
+
 
 +
* You can pen the <code>http://st-andrews.ac.uk/i2rda</code> site on the Windows web-browser (Chrome preferred) and copy text selecting and <code>ctrl+c</code>
 +
* This can then be pasted inside the PuTTY command-line by clicking the middle mouse button.
 +
* In many ways, copy-pasting is not great for learning.
 +
:- although some of the commands are too long to type out, even with history and tab-completion.
 +
:- try to also use tab-completion, and the history (<code>up</code>/<code>down</code> arrows and <code>Ctrl+r</code>)
 +
 
 +
<ins>Weakness to watch out for</ins>:
 +
* The marvin cluster (more precisely, the network it's attached to) doesn't carry graphics so well.
 +
* We shall be using several graphical programs, and they are all likely to run slowly.
 +
:- and sometimes even stall
 +
:- we'll cross that bridge when we come to it.
 +
 
 +
=GNU Screen 1 =
 +
 
 +
A program which allows several command-line sessions open, similar to the idea of open tabs in a web browser. Let's try it out.
 +
* To enter a new session, type <code>screen</code>
 +
* This will open with quite a bare screen except with a indicator line at the bottom.
 +
* <code>screen</code> works on the ''activator'' key concept, you need to use <code>Ctrl+l</code> (while <code>Ctrl</code>-key iis held down briefly, <code>l</code>-key is pressed) to activate any of its functions.
 
* After pressing <code>Ctrl+l</code> and releasing you then have a series of single key strokes that will do various useful things.
 
* After pressing <code>Ctrl+l</code> and releasing you then have a series of single key strokes that will do various useful things.
* Two switch back and forth, you use <code>Ctrl+l,n</code> ('''n''' for next) or <code>Ctrl+l,p</code> (p for previous)
+
* There will be one command-line session open when you start it.
 +
:- it's numbered 0, and called <code>scr1</code>, we'll mostly deal with it as <code>0</code>.
 +
* Let's learn how to get out of it first
 +
:- type <code>exit</code> again and you will be told you have exited <code>screen</code>.
 +
:- you are now back in the ordinary command-line.
 +
 
 +
= GNU Screen 2 =
 +
* Go back into screen, type <code>screen</code>.
 +
* you open a new session with <code>ctrl+l,c</code> which '''c'''reates a new session.
 +
:- you now have two sessions open
 +
* type <code>ctrl+l,c</code> again, for three open sessions. You can have more, but we'll stick to three: number <code>0</code>, <code>1</code> and <code>2</code>.
 +
* Switch back and forth between the three open sessions: use <code>Ctrl+l,n</code> ('''n''' for next) or <code>Ctrl+l,p</code> ('''p''' for previous)
 +
* Don't see anything different when you do this? Look again at the bottom line, the asterisk has changed position.
 +
:- the asterisk defines the active session
 +
:- you can move to a numbered session directly with <code>ctrl+l,1</code> or <code>ctrl+0,1,2</code> for sessions 0, 1 and 2.
 +
 
 +
= Getting a Queue slot =
 +
 
 +
We're going to use one of the screen sessions to get a slot from the queue.
 +
* Assuming you've launched screen, type <code>ctrl+l,0</code> to confirm you are in the first screen session.
 +
* Type <code>qrsh</code> which requests a queue slot ... it will take a little time to give you one.
 +
:- we shall not use this slot for the graphical programs, only the processing ones.
 +
* When you get a slot, notice if you are still on marvin, or one of the nodes (assignment is based on load usually)
 +
:- type qstat to see that you have allocated slot working in the queue.
 +
* let's get something trivial running here: execute the <code>prtgn.sh</code> script by typing <code>prtgn.sh</code>, then <code>RETURN</code>, and let it print out gene names to its heart's content.
 +
:- it's not drosophila, so only some of them are funny.
 +
 
 +
= Recovering a session =
 +
 
 +
* Now detach a session, <code>ctrl+l,d</code> to detach
 +
* Now you're outside screen, you can log out, switch off and go home if you like (don't please).
 +
* Next type <code>screen -r</code> to re-attach.
 +
:- did the process stop?
 +
:- unfortunately this cannot be done with many graphical programs
 +
:- though some have a command-line mode, where it is possible
 +
* Type ctrl+c, to stop it, demonstration over.
 +
* you can also record the whole session, inputs and outputs in a file
 +
:- done via <code>ctrl+l,:</code>, then type <code>hardcopy</code> and <code>RETURN</code>
 +
:- the name of the file is <code>hardcopy.0</code>

Latest revision as of 17:21, 11 May 2017

Course schedule

  • This is based on a 2 day Edinburgh Genomics course of the same name, with the following changes:
- "Introduction to Linux" moduel excluded
- "Sequencer technology overview" module excluded.
- No laboratory visit
- 50% of that course was theoretical, this will be reduced to 30%
  • Each section begins with a "Talk", and then a practical runthrough.
  • If necessary, some talk slides may be skipped, as the main idea is getting through the practicals.
  • Having said that if major theoretical points arise during a practical, they will be discussed.
  • Course website: http://stab.st-andrews.ac.uk/i2rda/
- this has all the presentations and practicals

Connecting to a remote Machine

Presenting this before introduction as some people might experience delays logging in.

  • We shall use a remote machine not the machine you are logged into locally
  • The program we shall use is PuTTY.
  • Please try to locate PuTTY in the applications section or on AppsAnywhere

Configuring PuTTY for connection

  • Server: marvin.st-andrews.ac.uk
  • Terminal | keyboard | check VT100+
  • Window | Selection | Control use of Mouse | set xterm
  • Connect | Data | enter username
  • Connection | ssh | X11 Forwarding | Check yes
  • Back to PuTTY main screen | select Default setting |click save

You should now be able to "open" a session

  • Be aware: typing in your password is done blindly. I.e. it does not appear on the screen.

Note: If you don't have your password, please ask to have it reset for you.

Computing resources notes

  • RNA-Seq like other Next Generation Sequencing technologies, is characterised by;
- heavy computational workloads
- many different software programs, sometimes doing the same thing, which can be arranged into a pipeline.
- long-running tasks.

These have three implications:

- The marvin cluster is an 11-machine shared computing resource, not a personal computer ... others are using it.
- We need to load the special software before using it
- We want to be able to have a process run unattended.
  • For these three aspects, we have:
- A queue system to use, we shall request an interactive session (qrsh) from the queue.
- Use the module system to load, list and unload software programs
- We shall use the GNU Screen utility so we can do other things while waiting.

Computing resources diagram

Marv.png

Aspects of using Windows terminals to connect to Linux

  • You can pen the http://st-andrews.ac.uk/i2rda site on the Windows web-browser (Chrome preferred) and copy text selecting and ctrl+c
  • This can then be pasted inside the PuTTY command-line by clicking the middle mouse button.
  • In many ways, copy-pasting is not great for learning.
- although some of the commands are too long to type out, even with history and tab-completion.
- try to also use tab-completion, and the history (up/down arrows and Ctrl+r)

Weakness to watch out for:

  • The marvin cluster (more precisely, the network it's attached to) doesn't carry graphics so well.
  • We shall be using several graphical programs, and they are all likely to run slowly.
- and sometimes even stall
- we'll cross that bridge when we come to it.

GNU Screen 1

A program which allows several command-line sessions open, similar to the idea of open tabs in a web browser. Let's try it out.

  • To enter a new session, type screen
  • This will open with quite a bare screen except with a indicator line at the bottom.
  • screen works on the activator key concept, you need to use Ctrl+l (while Ctrl-key iis held down briefly, l-key is pressed) to activate any of its functions.
  • After pressing Ctrl+l and releasing you then have a series of single key strokes that will do various useful things.
  • There will be one command-line session open when you start it.
- it's numbered 0, and called scr1, we'll mostly deal with it as 0.
  • Let's learn how to get out of it first
- type exit again and you will be told you have exited screen.
- you are now back in the ordinary command-line.

GNU Screen 2

  • Go back into screen, type screen.
  • you open a new session with ctrl+l,c which creates a new session.
- you now have two sessions open
  • type ctrl+l,c again, for three open sessions. You can have more, but we'll stick to three: number 0, 1 and 2.
  • Switch back and forth between the three open sessions: use Ctrl+l,n (n for next) or Ctrl+l,p (p for previous)
  • Don't see anything different when you do this? Look again at the bottom line, the asterisk has changed position.
- the asterisk defines the active session
- you can move to a numbered session directly with ctrl+l,1 or ctrl+0,1,2 for sessions 0, 1 and 2.

Getting a Queue slot

We're going to use one of the screen sessions to get a slot from the queue.

  • Assuming you've launched screen, type ctrl+l,0 to confirm you are in the first screen session.
  • Type qrsh which requests a queue slot ... it will take a little time to give you one.
- we shall not use this slot for the graphical programs, only the processing ones.
  • When you get a slot, notice if you are still on marvin, or one of the nodes (assignment is based on load usually)
- type qstat to see that you have allocated slot working in the queue.
  • let's get something trivial running here: execute the prtgn.sh script by typing prtgn.sh, then RETURN, and let it print out gene names to its heart's content.
- it's not drosophila, so only some of them are funny.

Recovering a session

  • Now detach a session, ctrl+l,d to detach
  • Now you're outside screen, you can log out, switch off and go home if you like (don't please).
  • Next type screen -r to re-attach.
- did the process stop?
- unfortunately this cannot be done with many graphical programs
- though some have a command-line mode, where it is possible
  • Type ctrl+c, to stop it, demonstration over.
  • you can also record the whole session, inputs and outputs in a file
- done via ctrl+l,:, then type hardcopy and RETURN
- the name of the file is hardcopy.0