Difference between revisions of "Miseq Prokaryote FASTQ analysis"

From wiki
Jump to: navigation, search
 
(5 intermediate revisions by the same user not shown)
Line 9: Line 9:
 
# Go into marvin scratch area and create a new directory, reflecting the date of the run.
 
# Go into marvin scratch area and create a new directory, reflecting the date of the run.
 
# Make sure you have mounted the hdrive onto marvin.
 
# Make sure you have mounted the hdrive onto marvin.
# Create symlinks from your mounted directory to the new directory n scrathc you've just created. An example is:
+
# Create symlinks from your mounted directory to the new directory in $SCRATCH you've just created. An example is:<pre>&#10;for i in $(ls /storage/home/users/ramon/mnt/miseqda/2016-07-15_160715_M01714_0021_000000000-ANWN5/*.fastq.gz); do ln -s $i; done</pre>
#: <pre>for i in $(ls /storage/home/users/ramon/mnt/miseqda/2016-07-15_160715_M01714_0021_000000000-ANWN5/*.fastq.gz); do ln -s $i; done</pre>
+
# We need to generate a file listing of these files for later processing. The listing should reflect the paired nature of the fastq files, and if they are named properly this should not be a problem and can be created with<pre>&#10;ls *.fastq.gz > fq.lst</pre>&#10;followed by editing so that each line contains the pair appropriate pair of fastq files.
 
+
# then we can run a quality detection program, with the most widely used one being Babraham Institute's FASTQC. The following script will do each fastqc pair in parallel.<pre>&#10;#!/bin/bash&#10;#$ -cwd&#10;#$ -j y&#10;#$ -S /bin/bash&#10;#$ -V&#10;#$ -q marvin.q&#10;# some quick "argument accounting"&#10;EXPECTED_ARGS=1&#10;if [ $# -ne $EXPECTED_ARGS ]; then &#10;echo "error, this script should be fed with one argument: a filelist of fastq(.gz) files"&#10;exit&#10;fi&#10;module load FASTQC&#10;N=( $(sed -n "${SGE_TASK_ID}p" $1) )&#10;R1=${N[0]}&#10;R2=${N[1]}&#10;# echo "fastqc $R1 $R2"&#10;fastqc $R1 $R2</pre>
# then we can run a quality detection program, with themost widely used one being FASTQC. The following script will do each fastqc pair in parallel.
 
 
 
#!/bin/bash
 
#$ -cwd  
 
#$ -j y
 
#$ -S /bin/bash  
 
#$ -V
 
#$ -q marvin.q
 
 
# some quick "argument accounting"
 
EXPECTED_ARGS=1 # change value to suit!
 
if [ $# -ne $EXPECTED_ARGS ]; then
 
    echo "error, this script should be fed with one argument: a filelist of fastq(.gz) files"
 
    exit
 
fi
 
module load FASTQC
 
N=( $(sed -n "${SGE_TASK_ID}p" $1) )
 
R1=${N[0]}
 
R2=${N[1]}
 
# echo "fastqc $R1 $R2"
 
fastqc $R1 $R2
 

Latest revision as of 17:01, 18 July 2016

Introduction

We have our own Miseq machine and each week a run is carried out (each run costs about £1000 in consumibles) from various samples.

They are uploaded onto HDRIVE.

Procedure

  1. Go into marvin scratch area and create a new directory, reflecting the date of the run.
  2. Make sure you have mounted the hdrive onto marvin.
  3. Create symlinks from your mounted directory to the new directory in $SCRATCH you've just created. An example is:
    for i in $(ls /storage/home/users/ramon/mnt/miseqda/2016-07-15_160715_M01714_0021_000000000-ANWN5/*.fastq.gz); do ln -s $i; done
  4. We need to generate a file listing of these files for later processing. The listing should reflect the paired nature of the fastq files, and if they are named properly this should not be a problem and can be created with
    ls *.fastq.gz > fq.lst
    followed by editing so that each line contains the pair appropriate pair of fastq files.
  5. then we can run a quality detection program, with the most widely used one being Babraham Institute's FASTQC. The following script will do each fastqc pair in parallel.
    #!/bin/bash
    #$ -cwd
    #$ -j y
    #$ -S /bin/bash
    #$ -V
    #$ -q marvin.q
    # some quick "argument accounting"
    EXPECTED_ARGS=1
    if [ $# -ne $EXPECTED_ARGS ]; then 
    echo "error, this script should be fed with one argument: a filelist of fastq(.gz) files"
    exit
    fi
    module load FASTQC
    N=( $(sed -n "${SGE_TASK_ID}p" $1) )
    R1=${N[0]}
    R2=${N[1]}
    # echo "fastqc $R1 $R2"
    fastqc $R1 $R2