#################################################################### # Running the GAVIN-Plus variant interpretation tool for diagnostics #################################################################### ## Download GAVIN-Plus and data bundle mkdir gavin-tools cd gavin-tools wget http://molgenis.org/downloads/gavin/GAVIN-Plus-1.0.jar wget http://molgenis.org/downloads/gavin/data_bundle_r1.0/CGD_11oct2016.txt.gz wget http://molgenis.org/downloads/gavin/data_bundle_r1.0/FDR_allGenes_r1.0.tsv wget http://molgenis.org/downloads/gavin/data_bundle_r1.0/GAVIN_calibrations_r0.3.tsv wget http://molgenis.org/downloads/gavin/data_bundle_r1.0/clinvar.patho.fix.11oct2016.vcf.gz cd .. ## Download demo files (note that the VCF has been ## annotated by SnpEff, ExAC, GoNL and CADD-SNVs) wget http://molgenis.org/downloads/gavin/demo/GAVIN-Plus_Demo_1000G_Spiked.vcf wget http://molgenis.org/downloads/gavin/demo/GAVIN-Plus_Demo_1000G_Spiked.fromCadd.tsv ## Run analysis on GAVIN-Plus_Demo_1000G_Spiked.vcf java -Xmx4g -jar gavin-tools/GAVIN-Plus-1.0.jar -i GAVIN-Plus_Demo_1000G_Spiked.vcf -o GAVIN-Plus_Demo_1000G_Spiked.RVCF.firstpass.vcf -m CREATEFILEFORCADD -a GAVIN-Plus_Demo_1000G_Spiked.toCadd.tsv -c gavin-tools/clinvar.patho.fix.11oct2016.vcf.gz -d gavin-tools/CGD_11oct2016.txt.gz -f gavin-tools/FDR_allGenes_r1.0.tsv -g gavin-tools/GAVIN_calibrations_r0.3.tsv ## Inspect the results, how many interesting variants did we find? grep -v "#" GAVIN-Plus_Demo_1000G_Spiked.RVCF.firstpass.vcf | wc -l ## Note that not all variants could be assessed due to missing CADD scores ## these are written out in this file, which can be uploaded and scores ## using the CADD web service (http://cadd.gs.washington.edu/score) more GAVIN-Plus_Demo_1000G_Spiked.toCadd.tsv ## However - the output is included in the demo files for your convenience. ## Use GAVIN-Plus_Demo_1000G_Spiked.fromCadd.tsv to run the complete analysis: java -Xmx4g -jar gavin-tools/GAVIN-Plus-1.0.jar -i GAVIN-Plus_Demo_1000G_Spiked.vcf -o GAVIN-Plus_Demo_1000G_Spiked.RVCF.vcf -m ANALYSIS -a GAVIN-Plus_Demo_1000G_Spiked.fromCadd.tsv -c gavin-tools/clinvar.patho.fix.11oct2016.vcf.gz -d gavin-tools/CGD_11oct2016.txt.gz -f gavin-tools/FDR_allGenes_r1.0.tsv -g gavin-tools/GAVIN_calibrations_r0.3.tsv ## Inspect the results, how many variants are there now? grep -v "#" GAVIN-Plus_Demo_1000G_Spiked.RVCF.vcf | wc -l diff GAVIN-Plus_Demo_1000G_Spiked.RVCF.firstpass.vcf GAVIN-Plus_Demo_1000G_Spiked.RVCF.vcf ## ADDENDUM ## ## There are some helper tools available to post-process your output ## This one merges RVCF output back to your original VCF: wget https://molgenis26.gcc.rug.nl/downloads/gavin/MergeBackTool-0.2.jar ## And this one splits RLV fields into multiple separate INFO fields like "RLV_*": wget https://molgenis26.gcc.rug.nl/downloads/gavin/SplitRlvTool-0.2.jar ## Try it yourself on the demo data ## Merge the output back: java -jar MergeBackTool-0.2.jar -i GAVIN-Plus_Demo_1000G_Spiked.vcf -v GAVIN-Plus_Demo_1000G_Spiked.RVCF.vcf -o mergeBack.vcf ## Split RLV fields in either RVCF or the MergeBack file: java -jar SplitRlvTool-0.2.jar -i GAVIN-Plus_Demo_1000G_Spiked.RVCF.vcf -o splitRLV.vcf java -jar SplitRlvTool-0.2.jar -i mergeBack.vcf -o mergeBackSplitRLV.vcf