A Bayesian framework for quantifying allele count evidence for variant pathogenicity

Fedor Konovalov*

Independent Clinical Bioinformatics Laboratory, Moscow, Russia

fk [at] clinbio.ru

Abstract

Allele counts in affected individuals and population controls are central to assessing the pathogenicity of genetic variants in human medical genetics, yet their meaning is typically mediated through discrete thresholds and heuristic criteria. This work introduces a quantitative Bayesian population-genetic framework that summarizes allele count evidence as a single continuous measure: the Bayes factor comparing explicitly defined pathogenic and neutral models.

The framework is formulated for autosomal dominant rare diseases and incorporates disease prevalence, penetrance, cohort composition, and sequencing error rates as explicit parameters. Under the pathogenic model, allele frequency is treated as a latent variable governed by mutation-selection balance, approximated by a gamma distribution. Under the neutral model, allele frequencies are represented by a simulated site frequency spectrum of a reference population, obtained via forward Wright-Fisher simulations and calibrated to publicly available data from the Genome Aggregation Database (gnomAD). Observed allele counts in affected and control cohorts were evaluated under both models, with uncertainty handled through marginal likelihood calculation.

Application to publicly available gnomAD data, specifically non-Finnish European cohorts, showed that the Bayes factor varies smoothly across allele count configurations, without reliance on predefined thresholds. The strength of evidence depends strongly on affected cohort size and on false positive rates in large control datasets, both of which materially influence inference at current sample scales. Across representative scenarios, the model reproduces qualitative behavior of ACMG/AMP allele-frequency criteria while providing a continuous, model-based alternative that makes evidence directly comparable across studies and clinical contexts.

Numerical evaluation is performed via direct integration over allele frequency and mutation rate, in log-space with explicit cancellation of low-frequency singular behavior. Robustness analyses showed that realistic deviations in the neutral site frequency spectrum produce bounded changes in the Bayes factor, remaining within supporting-level evidence shifts.

The framework provides a mathematically grounded and flexible approach to allele count interpretation, supporting ongoing efforts toward quantitative variant classification.

Keywords: Bayesian, pathogenicity, population, monogenic, human