Difference between revisions of "Mapping.py"
(Created page with "= Introduction = Another part of Miguel Pnheiros script_tools suite. = Outline = Each pair of FASTQ files are mapped to the reference. Bowtie, bwa or smalt are available....") |
|||
Line 1: | Line 1: | ||
= Introduction = | = Introduction = | ||
− | Another part of Miguel | + | Another part of Miguel Pinheiro's script_tools suite. This takes paired-end FASTQ reads and generates a cleaned-up bam file from them. |
= Outline = | = Outline = |
Latest revision as of 11:48, 23 February 2017
Introduction
Another part of Miguel Pinheiro's script_tools suite. This takes paired-end FASTQ reads and generates a cleaned-up bam file from them.
Outline
Each pair of FASTQ files are mapped to the reference. Bowtie, bwa or smalt are available.
run_soft.run_bwa_map(self.sz_sample_name, sz_out_directory, self.sz_index_file, self.sz_file_1, self.sz_file_2, self.sz_single_file, self.n_read_length)
The output sam is converted to bam
sz_cmd = "%s view -bS %s%s.sam > %s%s_.bam" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)
This bam is then sorted
sz_cmd = "%s sort %s%s_.bam %s%s__" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)
The MarkDuplicates tool from picard-tools is used to clean out redundant alignments
sz_cmd = "java -jar %s/MarkDuplicates.jar INPUT=%s%s__.bam OUTPUT=%s%s.bam M=%s%s_%s.txt" % (Constants.SOFTWARE_PICARD_TOOLS_DIRECTORY, sz_working_directory, sz_sample_name,
This duplicate-marked bam file is then indexed with normal (i.e. not tabix, tabular index):
sz_cmd = "%s index %s%s.bam" % (Constants.SOFTWARE_SAMTOOLS, sz_working_directory, sz_sample_name)
We delete the unecessary bam and sams:
sz_cmd = "rm %s%s_.bam %s%s__.bam %s%s.sam" % (sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)
The end bam files are in fact the duplicate-marked files, so we are ready, but we also generate a coverage report from them:
sz_cmd = "%s/genomeCoverageBed -ibam %s%s.bam -bga > %s%s.cov" % (Constants.SOFTWARE_BED_TOOLS, sz_working_directory, sz_sample_name, sz_working_directory, sz_sample_name)