Gábor Erdős1* and Zsuzsanna Dosztányi2
1ELTE Protein Dynamics Research Group
2MTA-ELTE Momentum Bioinformatis Research Group
gabor.erdos [at] ttk.elte.hu
Abstract
The thermodynamic properties of proteins are fundamental to understanding their function, dysfunction, and evolution. However, experimental characterization of these properties, especially for intrinsically disordered proteins (IDPs) that exist as dynamic conformational ensembles, remains a significant challenge. Computational methods have emerged as a powerful alternative, yet they often require extensive training on protein-specific data, limiting their ability to generalize.
Here, we present a novel transformer based message parsing graph neural network (trMPNN) architecture for the zero-shot prediction of protein thermodynamic properties. By representing proteins as graphs our model learns the underlying physicochemical principles governing protein thermodynamics while retaining speed that allows for the analysis of complete proteomes. The network was trained by maximizing the probability of the native structure against a set of decoys, guided by the Boltzmann distribution, allowing it to learn a transferable energy function. This approach enables accurate predictions on proteins not seen during training, overcoming a major limitation of previous methods.
We demonstrate the power of our network highlighting two key areas. First, we show its ability to predict ensemble-averaged thermodynamic properties of IDPs, providing insights into their unique conformational landscapes. Second, we showcase its accuracy in predicting absolute protein stability (ΔG values), a critical factor in protein engineering and disease pathogenesis. Our model achieves state-of-the-art performance in both tasks, with predictions in excellent agreement with experimental data.
The zero-shot capability of our GNN opens up exciting avenues for high-throughput screening of protein stability, the design of novel proteins with desired thermodynamic properties, and a deeper understanding of the complex interplay between sequence, structure, and thermodynamics in the proteome.
Keywords: Statistical mechanics, Protein structures

