Difference between revisions of "Miseq Prokaryote FASTQ analysis"
| (5 intermediate revisions by the same user not shown) | |||
| Line 9: | Line 9: | ||
| # Go into marvin scratch area and create a new directory, reflecting the date of the run. | # Go into marvin scratch area and create a new directory, reflecting the date of the run. | ||
| # Make sure you have mounted the hdrive onto marvin. | # Make sure you have mounted the hdrive onto marvin. | ||
| − | # Create symlinks from your mounted directory to the new directory  | + | # Create symlinks from your mounted directory to the new directory in $SCRATCH you've just created. An example is:<pre>
for i in $(ls /storage/home/users/ramon/mnt/miseqda/2016-07-15_160715_M01714_0021_000000000-ANWN5/*.fastq.gz); do ln -s $i; done</pre> | 
| − | + | # We need to generate a file listing of these files for later processing. The listing should reflect the paired nature of the fastq files, and if they are named properly this should not be a problem and can be created with<pre>
ls *.fastq.gz > fq.lst</pre>
followed by editing so that each line contains the pair appropriate pair of fastq files. | |
| − | + | # then we can run a quality detection program, with the most widely used one being Babraham Institute's FASTQC. The following script will do each fastqc pair in parallel.<pre>
#!/bin/bash
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -V
#$ -q marvin.q
# some quick "argument accounting"
EXPECTED_ARGS=1
if [ $# -ne $EXPECTED_ARGS ]; then 
echo "error, this script should be fed with one argument: a filelist of fastq(.gz) files"
exit
fi
module load FASTQC
N=( $(sed -n "${SGE_TASK_ID}p" $1) )
R1=${N[0]}
R2=${N[1]}
# echo "fastqc $R1 $R2"
fastqc $R1 $R2</pre> | |
| − | # then we can run a quality detection program, with  | ||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
Latest revision as of 17:01, 18 July 2016
Introduction
We have our own Miseq machine and each week a run is carried out (each run costs about £1000 in consumibles) from various samples.
They are uploaded onto HDRIVE.
Procedure
- Go into marvin scratch area and create a new directory, reflecting the date of the run.
- Make sure you have mounted the hdrive onto marvin.
-  Create symlinks from your mounted directory to the new directory in $SCRATCH you've just created. An example is:for i in $(ls /storage/home/users/ramon/mnt/miseqda/2016-07-15_160715_M01714_0021_000000000-ANWN5/*.fastq.gz); do ln -s $i; done 
-  We need to generate a file listing of these files for later processing. The listing should reflect the paired nature of the fastq files, and if they are named properly this should not be a problem and can be created withls *.fastq.gz > fq.lst followed by editing so that each line contains the pair appropriate pair of fastq files.
-  then we can run a quality detection program, with the most widely used one being Babraham Institute's FASTQC. The following script will do each fastqc pair in parallel.#!/bin/bash #$ -cwd #$ -j y #$ -S /bin/bash #$ -V #$ -q marvin.q # some quick "argument accounting" EXPECTED_ARGS=1 if [ $# -ne $EXPECTED_ARGS ]; then echo "error, this script should be fed with one argument: a filelist of fastq(.gz) files" exit fi module load FASTQC N=( $(sed -n "${SGE_TASK_ID}p" $1) ) R1=${N[0]} R2=${N[1]} # echo "fastqc $R1 $R2" fastqc $R1 $R2
