Implementation and clinical utility of the GATK gCNV pipeline for rare disease diagnostics using whole exome sequencing

Djordje Pavlovic1*, Anita Skakic1, Kristel Klaassen1, Irena Marjanovic1, Marina Parezanovic1, Nina Stevanovic1, Goran Cuturilo2, Marija Brankovic2, Brankica Bosankic2, Jelena Ruml Stojanovic2, Bojan Ristivojevic1, Branka Zukic1, Maja Stojiljkovic1 and Marina Andjelkovic1

1Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Serbia

2University Children's Hospital, Belgrade, Serbia

djordje5996 [at] gmail.com

Abstract

Copy number variants (CNVs) are critical contributors to rare disease etiology. While standardized workflows exist for detecting short variants from exome sequencing, historically, detecting CNVs required separate sequencing approaches or clinical assays. Here, we present the implementation of the GATK gCNV pipeline to detect CNVs using routine whole-exome sequencing (WES) in rare disease diagnostics.

We performed Illumina short-read WES on 646 rare disease patients from the Belgrade University Children’s Hospital over a one year period (Apr 2025 – Apr 2026). Data was processed utilizing standard GATK Best Practices for SNVs alongside the GATK gCNV pipeline for detecting structural variants. The gCNV workflow leverages read-depth coverage across exonic targets, normalizes read counts against a robust cohort model to correct for technical biases like GC content and bait capture efficiency. A Bayesian hidden Markov model (HMM) then infers copy-number states to call specific duplications and deletions. These were then filtered according to quality score, and analyzed for pathogenicity and disease causality in the Varsome Clinical software.

Genetic analysis of both single-nucleotide variants and copy number variants resulted in a positive diagnosis in 209 patients (32%), including 187 with SNVs and 22 with CNVs. An inconclusive result was obtained in 66 patients (10%), while 371 patients (58%) had a negative result. Detected CNVs were subsequently validated by chromosomal microarray analysis, with 21 being confirmed (>95%). Among the 209 positive diagnoses, 21 (approximately 10%) were clinically significant CNVs detected by the gCNV pipeline. These CNVs comprised 11 deletions and 10 duplications, with chromosome 16 being the most frequently affected (7 CNVs).

Integrating the GATK gCNV pipeline into exome sequencing data workflows provides a robust and reliable approach for CNV calling. This allows detecting pathogenic CNVs directly from exome data, which enhances diagnostic yield and streamlines clinical testing of rare diseases.

Keywords: CNV, rare disease, GATK

Acknowledgement: This work has been realized within the Center for Genetic Diagnostics of Rare Diseases, and funded by BRIDGING-RD, HORIZON-WIDERA-2023-ACCESS-02, N°101160079