Explanation of gavin_r0.5_calibvars.cadd_v1.4annot.tsv.gz. #### Rows in the data (n = 399,518): Each row represents one genetic variant (mutation) that has been classified as either PATHOGENIC or POPULATION. The PATHOGENIC variants are considered to be disease causing by reputable sources. For POPULATION variants this has not been established and are thus presumed to be safe/benign. The POPULATION variants are selected based on gene-specific properties using the GAVIN method (see: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1141-7) and serve as a highly representative set of negatives. #### Basic variant annotation, this includes gene, group (PATHOGENIC/POPULATION), effect and impact from SnpEff, and CADD scaled score. Position (chrom/pos/ref/alt) should be the same as CADD annotation unless something went wrong. Column 9 is equal to CADD PHRED in column 116. 1: gene 2: chr 3: pos 4: ref 5: alt 6: group 7: effect 8: impact 9: cadd All features for the CADD 1.4 annotation. For full description and to see which are used to train the original CADD scores, see: https://cadd.gs.washington.edu/static/ReleaseNotes_CADD_v1.4.pdf 10: CaddChrom 11: CaddPos 12: CaddRef 13: CaddAlt 14: Type 15: Length 16: AnnoType 17: Consequence 18: ConsScore 19: ConsDetail 20: GC 21: CpG 22: motifECount 23: motifEName 24: motifEHIPos 25: motifEScoreChng 26: oAA 27: nAA 28: GeneID 29: FeatureID 30: GeneName 31: CCDS 32: Intron 33: Exon 34: cDNApos 35: relcDNApos 36: CDSpos 37: relCDSpos 38: protPos 39: relProtPos 40: Domain 41: Dst2Splice 42: Dst2SplType 43: minDistTSS 44: minDistTSE 45: SIFTcat 46: SIFTval 47: PolyPhenCat 48: PolyPhenVal 49: priPhCons 50: mamPhCons 51: verPhCons 52: priPhyloP 53: mamPhyloP 54: verPhyloP 55: bStatistic 56: targetScan 57: mirSVR-Score 58: mirSVR-E 59: mirSVR-Aln 60: cHmmTssA 61: cHmmTssAFlnk 62: cHmmTxFlnk 63: cHmmTx 64: cHmmTxWk 65: cHmmEnhG 66: cHmmEnh 67: cHmmZnfRpts 68: cHmmHet 69: cHmmTssBiv 70: cHmmBivFlnk 71: cHmmEnhBiv 72: cHmmReprPC 73: cHmmReprPCWk 74: cHmmQuies 75: GerpRS 76: GerpRSpval 77: GerpN 78: GerpS 79: TFBS 80: TFBSPeaks 81: TFBSPeaksMax 82: tOverlapMotifs 83: motifDist 84: Segway 85: EncH3K27Ac 86: EncH3K4Me1 87: EncH3K4Me3 88: EncExp 89: EncNucleo 90: EncOCC 91: EncOCCombPVal 92: EncOCDNasePVal 93: EncOCFairePVal 94: EncOCpolIIPVal 95: EncOCctcfPVal 96: EncOCmycPVal 97: EncOCDNaseSig 98: EncOCFaireSig 99: EncOCpolIISig 100: EncOCctcfSig 101: EncOCmycSig 102: Grantham 103: Dist2Mutation 104: Freq100bp 105: Rare100bp 106: Sngl100bp 107: Freq1000bp 108: Rare1000bp 109: Sngl1000bp 110: Freq10000bp 111: Rare10000bp 112: Sngl10000bp 113: dbscSNV-ada_score 114: dbscSNV-rf_score Outcome of CADD model in raw SVM and logarithmically scaled PHRED scores. 115: RawScore 116: PHRED