Difference between revisions of "Mapping.py"

From wiki
Jump to: navigation, search
(Created page with "= Introduction = Another part of Miguel Pnheiros script_tools suite. = Outline = Each pair of FASTQ files are mapped to the reference. Bowtie, bwa or smalt are available....")
 
 
Line 1: Line 1:
 
= Introduction =
 
= Introduction =
  
Another part of Miguel Pnheiros script_tools suite.
+
Another part of Miguel Pinheiro's script_tools suite. This takes paired-end FASTQ reads and generates a cleaned-up bam file from them.
  
 
= Outline =
 
= Outline =

Latest revision as of 11:48, 23 February 2017

Introduction

Another part of Miguel Pinheiro's script_tools suite. This takes paired-end FASTQ reads and generates a cleaned-up bam file from them.

Outline

Each pair of FASTQ files are mapped to the reference. Bowtie, bwa or smalt are available.

run_soft.run_bwa_map(self.sz_sample_name, sz_out_directory, self.sz_index_file, self.sz_file_1, self.sz_file_2, self.sz_single_file, self.n_read_length)

The output sam is converted to bam

sz_cmd = "%s view -bS %s%s.sam > %s%s_.bam" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)

This bam is then sorted

sz_cmd = "%s sort %s%s_.bam %s%s__" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)

The MarkDuplicates tool from picard-tools is used to clean out redundant alignments

sz_cmd = "java -jar %s/MarkDuplicates.jar INPUT=%s%s__.bam OUTPUT=%s%s.bam M=%s%s_%s.txt" % (Constants.SOFTWARE_PICARD_TOOLS_DIRECTORY, sz_working_directory, sz_sample_name,

This duplicate-marked bam file is then indexed with normal (i.e. not tabix, tabular index):

sz_cmd = "%s index %s%s.bam" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name)

We delete the unecessary bam and sams:

sz_cmd = "rm %s%s_.bam %s%s__.bam %s%s.sam" % (sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)

The end bam files are in fact the duplicate-marked files, so we are ready, but we also generate a coverage report from them:

sz_cmd = "%s/genomeCoverageBed -ibam %s%s.bam -bga > %s%s.cov" % (Constants.SOFTWARE_BED_TOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)