Difference between revisions of "Intro to RNA-Seq Data Analysis Course"

From wiki
Jump to: navigation, search
Line 1: Line 1:
 
= Course schedule =
 
= Course schedule =
  
* This is a cut-down version of 1 day course.
+
* This is based on a 2 day Edinburgh Genomics course of the same name.
* History and theory has been left out
+
* Intro to Linux, Sequencer visits and technology iparts taken out. Also, less theory.
* Scripting is excluded (although plenty of one-liners)
 
 
* Maximise practical aspect.
 
* Maximise practical aspect.
 
* Having said that, if you fall behind, listening is better than catching up
 
* Having said that, if you fall behind, listening is better than catching up
  
Course website: <code>http://stab.st-andrews.ac.uk/hdi2u/</code>
+
Course website: <code>http://stab.st-andrews.ac.uk/i2rda/</code>
  
 
= Connecting to a remote Machine =
 
= Connecting to a remote Machine =
Line 26: Line 25:
 
* Back to PuTTY main screen | select Default setting |click save
 
* Back to PuTTY main screen | select Default setting |click save
  
You should now be able to open a session, entering your password and get connected to marvin.
+
You should now be able to "open" a session
 +
* Be aware typng in your password is done blindly. I.e. it does not appear on the screen.
  
= Unix nearly 50 years old =
+
= Overview of RNA-Seq =
  
* Inspired by CTSS timesharing systems 1964
+
* For gene expression analyses, seen as a more powerful replacememnt to microarrays
* Computers were much slower then …but there was alot less data too
 
* Computers now much faster …but still fall short in meeting big data challenges
 
  
= Why so many different Unix’s? =
+
= Computing resources =
  
* AIX, IBM’s Unix
+
* RNA-Seq is a heavy workload, we need to be prepared for long-running tasks. This has two implications
* HP-UX, HP’s Unix
+
:- The marvin cluster is an 11-machine '''shared''' computing resource, not a personal computer ... others are using it.
* Solaris, Sun’s (Oracle’s) Unix
+
:- We want to be able to have a process run unattended.
* Linux: Ubuntu, Debian, RedHat, SuSE, many others.
+
* For these two aspects, we have:
* Mac OSX: s an Unix “under the hood”
+
* A queue system to use, we shall request an interactive session ('''qrsh''') from the queue.
* On Windows, you can use Cygwin or install a virtual Linux.
+
* We shall use the '''GNU Screen''' utility so we can do other things while waiting.
  
==Linux particularities==
+
==GNU Screen ==
  
* Connected to Open source code (GNU)
+
Simply allows several command-line sessions to which you switch back and forth.
* A grassroots movement
+
To enter a new session:
* Immense information out on the web
+
screen
 
+
* This will open with quite bare screen except with a bottom line
= Unix and Genomics: Common ground =
+
* there will be two command-line windows open.
 
+
* Two switch back and forth, you use Ctrl+l,n (n for next) or Ctrl+l,p (p for previous)
==Challenges==
 
 
 
* A few large files, multitude of small files
 
* Small inefficiencies add up to large delays
 
 
 
==Strengths==
 
 
 
* Automation
 
* Small, gradual improvements
 
* Focus on performance
 
 
 
= Represents a style of work =
 
 
 
==Characteristics==
 
 
 
* Small tools, do one thing well
 
* Combine these as building blocks for larger tasks
 
* Look out for small inefficiencies: they add up to large delays
 
 
 
{|
 
!
 
 
 
! '''Good news'''
 
! '''Bad news'''
 
|-
 
| Details
 
| It’s there somewhere
 
| Demands patience
 
|-
 
| Preparation
 
| Subsequent actions easy
 
| First time is hard
 
|-
 
| Memorizing
 
| Repetition strengthens
 
| Reliance on memory
 
|}
 
 
 
= Things to get used to =
 
 
 
{|
 
! '''On one hand'''
 
! '''On the other hand'''
 
|-
 
| Personal
 
| Shared
 
|-
 
| Single load
 
| Batch load
 
|-
 
| General usage
 
| Focused usage
 
|-
 
| WYSIWYG
 
| WYSIWYM
 
|}
 
 
 
The command line (also called the shell) is Unix’s central tool
 
 
 
= Unix Philosophy =
 
 
 
==Aspects==
 
 
 
* Effective use of the command-line
 
* Single optimised small tools can be used as building blocks
 
* Exposes and so does not hide, details
 
* Powerful approach can lead easily-made big mistakes
 
 
 
==Measures==
 
 
 
* Test before executing
 
* Realise that the tiniest of details can be important
 
* Consulting help documentation continuously
 

Revision as of 14:07, 7 May 2017

Course schedule

  • This is based on a 2 day Edinburgh Genomics course of the same name.
  • Intro to Linux, Sequencer visits and technology iparts taken out. Also, less theory.
  • Maximise practical aspect.
  • Having said that, if you fall behind, listening is better than catching up

Course website: http://stab.st-andrews.ac.uk/i2rda/

Connecting to a remote Machine

Presenting this before introduction as some people might experience delays logging in.

  • We shall use a remote machine not the machine you are logged into locally
  • The program we shall use is PuTTY.
  • Please try to locate PuTTY in the applications section or on AppsAnywhere

Configuring PuTTY for connection

  • Server: marvin.st-andrews.ac.uk
  • Terminal | keyboard | check VT100+
  • Window | Selection | Control use of Mouse | set xterm
  • Connect | Data | enter username
  • Connection | ssh | X11 Forwarding | Check yes
  • Back to PuTTY main screen | select Default setting |click save

You should now be able to "open" a session

  • Be aware typng in your password is done blindly. I.e. it does not appear on the screen.

Overview of RNA-Seq

  • For gene expression analyses, seen as a more powerful replacememnt to microarrays

Computing resources

  • RNA-Seq is a heavy workload, we need to be prepared for long-running tasks. This has two implications
- The marvin cluster is an 11-machine shared computing resource, not a personal computer ... others are using it.
- We want to be able to have a process run unattended.
  • For these two aspects, we have:
  • A queue system to use, we shall request an interactive session (qrsh) from the queue.
  • We shall use the GNU Screen utility so we can do other things while waiting.

GNU Screen

Simply allows several command-line sessions to which you switch back and forth. To enter a new session:

screen
  • This will open with quite bare screen except with a bottom line
  • there will be two command-line windows open.
  • Two switch back and forth, you use Ctrl+l,n (n for next) or Ctrl+l,p (p for previous)