Intro to RNA-Seq Data Analysis Course
Contents
Course schedule
- This is based on a 2 day Edinburgh Genomics course of the same name, with the following changes:
- - "Introduction to Linux" moduel excluded
- - "Sequencer technology overview" module excluded.
- - No laboratory visit
- - 50% of that course was theoretical, this will be reduced to 30%
- Each section begins with a "Talk", and then a practical runthrough.
- If necessary, some talk slides may be skipped, as the main idea is getting through the practicals.
- Having said that if major theoretical points arise during a practical, they will be discussed.
- Course website:
http://stab.st-andrews.ac.uk/i2rda/
- - this has all the presentations and practicals
Connecting to a remote Machine
Presenting this before introduction as some people might experience delays logging in.
- We shall use a remote machine not the machine you are logged into locally
- The program we shall use is PuTTY.
- Please try to locate PuTTY in the applications section or on AppsAnywhere
Configuring PuTTY for connection
- Server:
marvin.st-andrews.ac.uk
- Terminal | keyboard | check VT100+
- Window | Selection | Control use of Mouse | set xterm
- Connect | Data | enter username
- Connection | ssh | X11 Forwarding | Check yes
- Back to PuTTY main screen | select Default setting |click save
You should now be able to "open" a session
- Be aware: typing in your password is done blindly. I.e. it does not appear on the screen.
Note: If you don't have your password, please ask to have it reset for you.
Computing resources
- RNA-Seq like other Next Generation Sequencing technologies, is characterised by;
- - heavy computational workloads
- - many different software programs, sometimes doing the same thing, which can be arranged into a pipeline.
- - long-running tasks.
These have three implications:
- - The marvin cluster is an 11-machine shared computing resource, not a personal computer ... others are using it.
- - We need to load the special software before using it
- - We want to be able to have a process run unattended.
- For these three aspects, we have:
- - A queue system to use, we shall request an interactive session (
qrsh
) from the queue. - - Use the module system to load, list and unload software programs
- - We shall use the GNU Screen utility so we can do other things while waiting.
Aspect of using Windows terminals to connect to Linux
- You can run the Windows web-browser (Chrome preferred) and copy text selecting and
ctrl+c
- This can then be pasted inside the putty command-line by clicking the middle mouse button.
- In many ways, copy-pasting is not great for learning.
- - although some of the commands are too long to type out, even with history and tab-completion.
- - try to also use tab-completion, and the history (
up
/down
arrows andCtrl+r
)
Weakness to watch out for:
- The marvin cluster (more precisely, the network it's attached to) doesn't carry graphics so well.
- We shall be using several graphical programs, and they are all likely to run slowly.
- - and sometimes even stall
- - we'll cross that bridge when we come to it.
GNU Screen 1
A program which allows several command-line sessions open, similar to the idea of open tabs in a web browser. Let's try it out.
- To enter a new session, type
screen
- This will open with quite a bare screen except with a indicator line at the bottom.
- Screen works on the activator key concept, you need to use
Ctrl+l
(whileCtrl
-key iis held down briefly,l
-key is pressed) to activate any of its functions. - After pressing
Ctrl+l
and releasing you then have a series of single key strokes that will do various useful things. - There will be two command-line windows open when you start it.
- Let's learn how to get out of it first
- - type
exit
, you should see you have one command-line session less. - - type
exit
again and you will be told you have exited screen - - you are now back in the ordinary command-line.
GNU Screen 2
- Go back into screen
- Switch back and forth between the two open sessions: use
Ctrl+l,n
(n for next) orCtrl+l,p
(p for previous) - Don't see anything different when you do this? Look again at the bottom line, the asterisk has changed position.
- - the asterisk defines the active session
- - you can move to a numbered session with
ctrl+l,1
orctrl+l,2
for session no.1 and no.2 respectively. - - you open a new session with
ctrl+l,c
which creates a new session.
Getting a Queue slot
We're going to use one of the screen sessions to get a slot from the queue.
- Assuming you've launched screen, type
ctrl+l,0
to confirm you are in the first screen session. - Type
qrsh
which requests a queue slot ... it will take a little time to give you one.
- - we shall not use this slot for the graphical programs, only the processing ones.
Overview of RNA-Seq
- For gene expression analyses, seen as a more powerful replacememnt to microarrays