Sara Stankovic1*, Nikola Jocic1, Djordje Pavlovic1, Marina Parezanovic1, Nina Stevanovic1, Kristina Grujic1, Jovana Komazec1, Kristel Klaassen1, Anita Skakic1, Marina Andjelkovic1, Milena Ugrin1, Vesna Spasovski1, Steven Laurie2 and Maja Stojiljkovic1
1Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, Serbia
2Centro Nacional de Análisis Genómico, Barcelona, Spain
sara.stankovic [at] imgge.bg.ac.rs
Abstract
Benchmarking of bioinformatics pipelines is the foundation of quality assurance in diagnostics, and is of particular importance in the rare disease field, where sensitivity and specificity directly influence diagnostic success rate. Institute of Molecular Genetics and Genetic Engineering (IMGGE) has an in-house short-read whole genome sequencing (SR-WGS) pipeline for variant detection, without prior cross-institutional benchmarking in the context of rare disease diagnostics performed.
As part of the BRIDGING-RD project, IMGGE SR-WGS pipeline was benchmarked against the NIST HG002/NA24385 Genome in a Bottle gold-standard truth set (v4.2.1), in collaboration with Centro Nacional de Análisis Genómico (CNAG). Both centers used the same 30x coverage Illumina short-read FastQ data generated from HG002 NIST reference material. Benchmarking was performed using Illumina’s open-source hap.py framework, with evaluation restricted to the ~88.5% of the autosomal GRCh38 genome covered by the NIST high-confidence callable regions BED file. Precision, Recall, and F1 score were calculated for single nucleotide variants (SNVs) and short insertions and deletions (InDels).
Both pipelines performed well against the gold-standard truth set. The IMGGE workflow achieved F1 scores of 0.971–0.981, with SNV Precision of 99.41–99.76% and Recall of 95.35–96.81%, and InDel Precision of 98.99–99.19% and Recall of 95.07–96.42%. The CNAG pipeline achieved slightly higher F1 scores of 0.986–0.991 across all categories, with higher Recall for both variant types. Notably, IMGGE workflow showed higher Precision but lower Recall compared to CNAG, suggesting that post-calling filtration step in the IMGGE pipeline reduces false positives at the cost of increased false negatives.
The IMGGE SR-WGS pipeline demonstrated strong performance, confirming suitability for rare disease genomic testing. Comparison with CNAG pipeline identified optimization opportunities, particularly around filtering strategies and recall sensitivity. This work was conducted through knowledge transfer between CNAG and IMGGE as part of the BRIDGING-RD project, in which IMGGE staff received practical training in benchmarking methodology, genomics file formats, and open-source bioinformatics tools. The capacity to independently benchmark WGS pipelines was thereby established at IMGGE, laying a foundation for quality-assured workflows in rare disease genomics.
Keywords: WGS, benchmarking, bioinformatics, pipeline, genomics
Acknowledgement: This research was supported by the Horizon Europe Project BRIDGING-RD, HORIZON-WIDERA-2023-ACCESS-02, Grant Agreement N°101160079.

