Vladimir Poroikov, Olga Tarasova*, Anastassia Rudik, Leonid Stolbov, Dmitry Karasev, Dmitry Filimonov, Nadezhda Biziukova, Alexey Kuzikov, Victoria Shumyantseva and Tatiana Filippova
Institute of Biomedical Chemistry, Russian Academy of Medical Sciences (RAMS)
olga.a.tarasova [at] gmail.com
Abstract
The risk of interspecies virus transmission, the emergence of poorly understood pathogens, and the rising prevalence of multidrug-resistant viral variants lead to the necessity to discover novel, more safe and effective antiviral agents. Our objective is to develop an integrated AI/ML computational framework and curated data resource that can be helpful in the development of new antiviral agents.
The developed framework comprises the Antivir database, specifically constructed for this purpose, together with a computational module for predicting antiviral activity, off-target interactions, and adverse drug reactions (ADRs). The database integrates heterogeneous data on antiviral compounds, standardized according to the nomenclature of the International Committee on Taxonomy of Viruses (ICTV). The web resource also includes modules for off-target prediction, ADR evaluation, and ternary classification models for HIV drug resistance; all estimations are based on the Naïve Bayesian approach implemented in the PASS algorithm.
The Antivir database currently contains data on approximately 76,000 compounds with biological activity profiles against more than 20 viruses, integrating relationships among compounds, diseases, resistance-associated mutations, and gene polymorphisms. Predictive models are available for multiple viruses, including influenza, dengue, West Nile, hepatitis B and C, HIV-1, SARS-CoV, and SARS-CoV-2. The ROC AUC values characterizing predictive performance range from 0.86 to 0.99 for antiviral activities and reach 0.94 for adverse drug reactions (ADRs). For HIV drug resistance prediction, ternary classification models demonstrate average performance metrics of 0.913 sensitivity, 0.894 specificity, 0.741 precision, and 0.953 ROC AUC. For SARS-CoV-2 main protease (Mpro) inhibitory activity, the ROC AUC is approximately 0.93. To identify novel SARS-CoV-2 Mpro inhibitors, we conducted virtual screening of 1.6 million compounds from the ChemRar library. A hybrid ligand- and structure-based approach was applied, combining PASS models, pharmacophore mapping, and molecular docking. Based on this screening, 32 compounds were prioritized for experimental validation; four have already been tested, and two exhibited activities in the micromolar to submicromolar range.
The data and AI/ML methods integrated into the Antivir web-resource (https://way2drug.com/viruses/), which provides a robust environment for antiviral research. It combines high-quality, curated data with advanced machine learning models to accelerate the discovery of safe and effective therapeutic agents.
Keywords: antivirals, AI, database/web-resource, structure-activity relationship
Acknowledgement: The study is supported by the Program of Fundamental Scientific Research in the Russian Federation for the long-term period (2021–2030) (No. 124050800018-9).

