Mapping.py
Introduction
Another part of Miguel Pinheiro's script_tools suite. This takes paired-end FASTQ reads and generates a cleaned-up bam file from them.
Outline
Each pair of FASTQ files are mapped to the reference. Bowtie, bwa or smalt are available.
run_soft.run_bwa_map(self.sz_sample_name, sz_out_directory, self.sz_index_file, self.sz_file_1, self.sz_file_2, self.sz_single_file, self.n_read_length)
The output sam is converted to bam
sz_cmd = "%s view -bS %s%s.sam > %s%s_.bam" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)
This bam is then sorted
sz_cmd = "%s sort %s%s_.bam %s%s__" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)
The MarkDuplicates tool from picard-tools is used to clean out redundant alignments
sz_cmd = "java -jar %s/MarkDuplicates.jar INPUT=%s%s__.bam OUTPUT=%s%s.bam M=%s%s_%s.txt" % (Constants.SOFTWARE_PICARD_TOOLS_DIRECTORY, sz_working_directory, sz_sample_name,
This duplicate-marked bam file is then indexed with normal (i.e. not tabix, tabular index):
sz_cmd = "%s index %s%s.bam" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name)
We delete the unecessary bam and sams:
sz_cmd = "rm %s%s_.bam %s%s__.bam %s%s.sam" % (sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)
The end bam files are in fact the duplicate-marked files, so we are ready, but we also generate a coverage report from them:
sz_cmd = "%s/genomeCoverageBed -ibam %s%s.bam -bga > %s%s.cov" % (Constants.SOFTWARE_BED_TOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)