Difference between revisions of "Calculating coverage"

From wiki
Jump to: navigation, search
Line 8: Line 8:
 
  bedtools genomecov -bga -ibam <bamfile>
 
  bedtools genomecov -bga -ibam <bamfile>
  
Explanation
+
<u>Explanation</u>
 +
 
 +
== in a script with AWK ==
 +
 
 +
Here is an example script for using '''genomecov''' with AWK to calculate coverage
 +
 
 +
#!/bin/bash
 +
EXPECTED_ARGS=2
 +
if [ $# -ne $EXPECTED_ARGS ]; then
 +
        echo "This script uses awk to calculate average coverage in a bamfile. It needs $EXPECTED_ARGS arguments"
 +
        echo "Correct usage: $0 <subfolder_with_all_the_bamfiles>> <minimum_coverage_value>"
 +
        exit
 +
fi
 +
module load bedtools
 +
O=$(find $1 -iname "*.bam")
 +
ARRAY=( $O )
 +
for BF in ${ARRAY[@]}; do
 +
        echo -n "$BF: "
 +
        bedtools genomecov -bga -ibam $BF |awk -v t=$2 '{if($4>=t) {j=$3-$2+1;i=i+j;SUM=SUM+$4*j;}} END {print SUM/(i+0.0);}'
 +
done

Revision as of 13:03, 30 June 2016

Introduction

Coverage can be calculated in several different ways and this pages aims to describe a few of them

Bedtools genomecov

This is one of the most commonly used methods and requires loading of the bedtools module, also used in the callSNPs.py tool. The base of the command is

bedtools genomecov -bga -ibam <bamfile>

Explanation

in a script with AWK

Here is an example script for using genomecov with AWK to calculate coverage

#!/bin/bash
EXPECTED_ARGS=2
if [ $# -ne $EXPECTED_ARGS ]; then
        echo "This script uses awk to calculate average coverage in a bamfile. It needs $EXPECTED_ARGS arguments"
        echo "Correct usage: $0 <subfolder_with_all_the_bamfiles>> <minimum_coverage_value>"
        exit
fi
module load bedtools
O=$(find $1 -iname "*.bam")
ARRAY=( $O )
for BF in ${ARRAY[@]}; do
        echo -n "$BF: "
        bedtools genomecov -bga -ibam $BF |awk -v t=$2 '{if($4>=t) {j=$3-$2+1;i=i+j;SUM=SUM+$4*j;}} END {print SUM/(i+0.0);}'
done