Ross Stewart1*, Matthew Mort2, Kerstin Spirohn-Fitzgerald3, Marc Vidal3, Florent Laval3, Georges Coppin3, David Cooper2, Michael Calderwood3, Predrag Radivojac4, Maxime Tixhon3 and Tong Hao3
1Khoury College of Computer Sciences, Northeastern University
2Institute of Medical Genetics, School of Medicine, Cardiff University
3Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute
4Khoury College of Computer Sciences, Northeastern University, Boston MA, USA
stewart.ro [at] northeastern.edu
Abstract
Disruption of protein–protein interactions (PPIs) is a major mechanism of a variant’s deleterious effect. Computational tools are needed to assess such variants at scale, yet existing predictors rarely consider loss of specific interactions, particularly when variants perturb binding interfaces without significantly affecting protein stability. To address this problem, we present MutPred-PPI, a graph attention network that predicts interaction-specific (edgetic) effects of missense variants by operating on AlphaFold$,$3-based protein complex contact graphs with protein language model embeddings imposed upon nodes. We systematically evaluated our model with stringent group cross-validation as well as benchmark data recently collected within the IGVF Consortium. MutPred-PPI outperformed all baseline methods across all evaluation criteria, achieving an AUC of 0.85 on seen proteins and 0.72 on previously unseen proteins in cross-validation, demonstrating strong generalizability despite scarce training data. To demonstrate biomedical relevance, we applied MutPred-PPI to variants from ClinVar, HGMD, COSMIC, gnomAD, and two de novo neurodevelopmental disorder-linked datasets. Disease-associated variants from ClinVar and HGMD showed strong enrichment for both quasi-null and edgetic effects, whereas population variants from gnomAD increasingly preserved interactions with higher allele frequencies. Notably, we observed a strong edgetic disruption signature in highly recurrent cancer variants from both the full COSMIC dataset and a subset of variants from oncogenes. Recurrent tumor suppressor gene variants and autism spectrum disorder-associated variants exhibited moderate quasi-null enrichment, whilst neurodevelopmental disorder-linked variants showed a weak edgetic disruption signature. These results indicate distinct PPI perturbation mechanisms across disease types and show that MutPred-PPI captures functionally relevant molecular effects of pathogenic variants.
Keywords: protein–protein interaction, mutation
Acknowledgement: We thank the VarChAMP group from the IGVF Consortium for providing access to the benchmark dataset. We acknowledge the use of AlphaFold,3 (Google DeepMind) and the Catalogue of Somatic Mutations in Cancer (COSMIC, Wellcome Sanger Institute) under academic licenses. This work was supported by the NIH awards U01HG012022 (P.R.), R01GM145937 (P.R.), and UM1HG011989 (M.V.).

