Rethinking protein complex discovery from network topology to dynamic graph learning

Milana Grbić*

Faculty of Natural Science and Mathematics, University of Banja Luka

milana.grbic [at] pmf.unibl.org

Abstract

Protein complexes play a central role in cellular processes, yet their reliable identification from protein–protein interaction (PPI) networks remains challenging due to incompleteness and noise in experimentally derived interaction data. Traditional approaches primarily rely on static network representations and topology-based assumptions, which often fail to fully capture the dynamic nature of protein interactions. More recently, increasing attention has been directed toward dynamic PPI networks, as they are expected to provide richer and more informative representations of protein interactions.

The evolution of computational approaches for protein complex identification reflects a shift from classical network analysis to modern representation learning techniques. Structural limitations of PPI networks were investigated by examining the extent to which known protein complexes are supported as connected subgraphs. To address missing interactions, a variable neighborhood search (VNS) metaheuristic was applied to minimally augment networks and improve complex connectivity. The obtained results demonstrate that widely used PPI networks and complex standards exhibit incomplete support, indicating systematic gaps in interaction data. This observation is further supported by the tendency of community detection methods, which typically identify dense structures in networks, to yield inconsistent results when applied to protein complex identification.

In addition to topology-based approaches, machine learning methods were examined for predicting protein membership in complexes using protein-specific biological and sequence-derived features. While these methods improved predictive performance, their effectiveness remained dependent on the underlying network representation. To address this limitation, more recent approaches have focused on graph representation learning, with dynamic graph embedding techniques applied to model temporal aspects of protein interaction networks. By combining embedding-based representations with clustering methods, improved identification of protein complexes has been observed in dynamic settings compared to static ones.

Overall, the results indicate that the transition from static PPI networks toward learned and dynamic graph representations significantly enhances the ability to capture biologically meaningful protein complexes. This progression also suggests promising directions for integrating representation learning with biological prior knowledge in future computational models.

Keywords: protein–protein-interaction-networks, protein-complexes, graph-representation learning, dynamic-networks