Difference between revisions of "VCF"
Line 1: | Line 1: | ||
= Introduction = | = Introduction = | ||
− | A file format that records the variants manifested by reads against the reference they are aligned to. Most often it refers to multiple samples and the nature of the variants are described in columns dedicated to each sample. | + | A file format that records the variants manifested by reads against the reference they are aligned to. Most often it refers to multiple samples and the nature of the variants are described in columns dedicated to each sample. The variants are coded in sinle integers with 0 representing reads which conform to the reference allele. |
BCF is simply the binary (and therefore compressed) version of the file format. | BCF is simply the binary (and therefore compressed) version of the file format. | ||
− | It used to be maintained by the 1000 Genomes Project. The latest version 4.2 and its specification | + | It used to be maintained by the 1000 Genomes Project. The latest version 4.2 and its specification [http://samtools.github.io/hts-specs/VCFv4.2.pdf is hosted] at the samtools website. |
− | + | = Details = | |
* '''DP''' refer to overall depth at the locus/position without taking into account base quality | * '''DP''' refer to overall depth at the locus/position without taking into account base quality | ||
− | * '''DP4''' refers to 4 | + | * '''DP4''' refers to 4 depth readings separated by semicolons. In contrast to DP, these are filtered for base quality. The first pair refer to the depth of reads conforming to reference allele, first on the forward strand, second on the reverse stand. The second pair refer to the alternate allele depth. Again forward coming first and reverse coming second. |
Revision as of 11:29, 10 August 2016
Introduction
A file format that records the variants manifested by reads against the reference they are aligned to. Most often it refers to multiple samples and the nature of the variants are described in columns dedicated to each sample. The variants are coded in sinle integers with 0 representing reads which conform to the reference allele.
BCF is simply the binary (and therefore compressed) version of the file format.
It used to be maintained by the 1000 Genomes Project. The latest version 4.2 and its specification is hosted at the samtools website.
Details
- DP refer to overall depth at the locus/position without taking into account base quality
- DP4 refers to 4 depth readings separated by semicolons. In contrast to DP, these are filtered for base quality. The first pair refer to the depth of reads conforming to reference allele, first on the forward strand, second on the reverse stand. The second pair refer to the alternate allele depth. Again forward coming first and reverse coming second.