VCF

From wiki
Revision as of 11:32, 10 August 2016 by Rf (talk | contribs)
Jump to: navigation, search

Introduction

A file format that records the variants manifested by reads against the reference they are aligned to. Most often it refers to multiple samples and the nature of the variants are described in columns dedicated to each sample. The variants are coded in sinle integers with 0 representing reads which conform to the reference allele.

BCF is simply the binary (and therefore compressed) version of the file format.

It used to be maintained by the 1000 Genomes Project. The latest version 4.2 and its specification is hosted at the samtools website.

Details

  • DP refer to overall depth at the locus/position without taking into account base quality
  • DP4 refers to 4 depth readings separated by semicolons. In contrast to DP, these are filtered for base quality. The first pair refer to the depth of reads conforming to reference allele, first on the forward strand, second on the reverse stand. The second pair refer to the alternate allele depth. Again forward strand coming coming first and reverse coming second. A simple example is shown here