Mapping.py

From wiki
Jump to: navigation, search

Introduction

Another part of Miguel Pinheiro's script_tools suite. This takes paired-end FASTQ reads and generates a cleaned-up bam file from them.

Outline

Each pair of FASTQ files are mapped to the reference. Bowtie, bwa or smalt are available.

run_soft.run_bwa_map(self.sz_sample_name, sz_out_directory, self.sz_index_file, self.sz_file_1, self.sz_file_2, self.sz_single_file, self.n_read_length)

The output sam is converted to bam

sz_cmd = "%s view -bS %s%s.sam > %s%s_.bam" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)

This bam is then sorted

sz_cmd = "%s sort %s%s_.bam %s%s__" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)

The MarkDuplicates tool from picard-tools is used to clean out redundant alignments

sz_cmd = "java -jar %s/MarkDuplicates.jar INPUT=%s%s__.bam OUTPUT=%s%s.bam M=%s%s_%s.txt" % (Constants.SOFTWARE_PICARD_TOOLS_DIRECTORY, sz_working_directory, sz_sample_name,

This duplicate-marked bam file is then indexed with normal (i.e. not tabix, tabular index):

sz_cmd = "%s index %s%s.bam" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name)

We delete the unecessary bam and sams:

sz_cmd = "rm %s%s_.bam %s%s__.bam %s%s.sam" % (sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)

The end bam files are in fact the duplicate-marked files, so we are ready, but we also generate a coverage report from them:

sz_cmd = "%s/genomeCoverageBed -ibam %s%s.bam -bga > %s%s.cov" % (Constants.SOFTWARE_BED_TOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)