Difference between revisions of "Intro to RNA-Seq Data Analysis Course"

From wiki
Jump to: navigation, search
Line 53: Line 53:
  
 
= Aspect of using Windows terminals to connect to Linux =
 
= Aspect of using Windows terminals to connect to Linux =
[[File:marvv.png]]
+
[[File:marv.png]]
  
 
* You can run the Windows web-browser (Chrome preferred) and copy text selecting and <code>ctrl+c</code>
 
* You can run the Windows web-browser (Chrome preferred) and copy text selecting and <code>ctrl+c</code>

Revision as of 14:10, 11 May 2017

Course schedule

  • This is based on a 2 day Edinburgh Genomics course of the same name, with the following changes:
- "Introduction to Linux" moduel excluded
- "Sequencer technology overview" module excluded.
- No laboratory visit
- 50% of that course was theoretical, this will be reduced to 30%
  • Each section begins with a "Talk", and then a practical runthrough.
  • If necessary, some talk slides may be skipped, as the main idea is getting through the practicals.
  • Having said that if major theoretical points arise during a practical, they will be discussed.
  • Course website: http://stab.st-andrews.ac.uk/i2rda/
- this has all the presentations and practicals

Connecting to a remote Machine

Presenting this before introduction as some people might experience delays logging in.

  • We shall use a remote machine not the machine you are logged into locally
  • The program we shall use is PuTTY.
  • Please try to locate PuTTY in the applications section or on AppsAnywhere

Configuring PuTTY for connection

  • Server: marvin.st-andrews.ac.uk
  • Terminal | keyboard | check VT100+
  • Window | Selection | Control use of Mouse | set xterm
  • Connect | Data | enter username
  • Connection | ssh | X11 Forwarding | Check yes
  • Back to PuTTY main screen | select Default setting |click save

You should now be able to "open" a session

  • Be aware: typing in your password is done blindly. I.e. it does not appear on the screen.

Note: If you don't have your password, please ask to have it reset for you.

Computing resources

  • RNA-Seq like other Next Generation Sequencing technologies, is characterised by;
- heavy computational workloads
- many different software programs, sometimes doing the same thing, which can be arranged into a pipeline.
- long-running tasks.

These have three implications:

- The marvin cluster is an 11-machine shared computing resource, not a personal computer ... others are using it.
- We need to load the special software before using it
- We want to be able to have a process run unattended.
  • For these three aspects, we have:
- A queue system to use, we shall request an interactive session (qrsh) from the queue.
- Use the module system to load, list and unload software programs
- We shall use the GNU Screen utility so we can do other things while waiting.

Aspect of using Windows terminals to connect to Linux

Marv.png

  • You can run the Windows web-browser (Chrome preferred) and copy text selecting and ctrl+c
  • This can then be pasted inside the putty command-line by clicking the middle mouse button.
  • In many ways, copy-pasting is not great for learning.
- although some of the commands are too long to type out, even with history and tab-completion.
- try to also use tab-completion, and the history (up/down arrows and Ctrl+r)

Weakness to watch out for:

  • The marvin cluster (more precisely, the network it's attached to) doesn't carry graphics so well.
  • We shall be using several graphical programs, and they are all likely to run slowly.
- and sometimes even stall
- we'll cross that bridge when we come to it.

GNU Screen 1

A program which allows several command-line sessions open, similar to the idea of open tabs in a web browser. Let's try it out.

  • To enter a new session, type screen
  • This will open with quite a bare screen except with a indicator line at the bottom.
  • screen works on the activator key concept, you need to use Ctrl+l (while Ctrl-key iis held down briefly, l-key is pressed) to activate any of its functions.
  • After pressing Ctrl+l and releasing you then have a series of single key strokes that will do various useful things.
  • There will be two command-line windows open when you start it.
  • Let's learn how to get out of it first
- type exit, you should see you have one command-line session less.
- type exit again and you will be told you have exited screen.
- you are now back in the ordinary command-line.

GNU Screen 2

  • Go back into screen
  • Switch back and forth between the two open sessions: use Ctrl+l,n (n for next) or Ctrl+l,p (p for previous)
  • Don't see anything different when you do this? Look again at the bottom line, the asterisk has changed position.
- the asterisk defines the active session
- you can move to a numbered session with ctrl+l,1 or ctrl+l,2 for session no.1 and no.2 respectively.
- you open a new session with ctrl+l,c which creates a new session.

Getting a Queue slot

We're going to use one of the screen sessions to get a slot from the queue.

  • Assuming you've launched screen, type ctrl+l,0 to confirm you are in the first screen session.
  • Type qrsh which requests a queue slot ... it will take a little time to give you one.
- we shall not use this slot for the graphical programs, only the processing ones.
-

Overview of RNA-Seq

  • For gene expression analyses, seen as a more powerful replacememnt to microarrays