Multi-resolution framework for orthology and regulatory region delineation in divergent grass genomes

Bojana Banovic Deri1*, Thomas Hartwig2, Robert VanBuren3 and Vincenzo Rossi1

1Council for Agricultural Research and Economics, Research Centre for Cereal and Industrial Crops

2Heinrich-Heine University

3Department of Horticulture, Michigan State University

bojana.banovicderi [at] crea.gov.it

Abstract

Within the BOOSTER project, maize (Zea mays), teff (Eragrostis tef), and the desiccation-tolerant wild relative Eragrostis nindensis were selected as a comparative system for studying drought-associated variation across species with contrasting levels of tolerance. This system supports the identification of conserved and lineage-specific genes and regulatory features that may inform future transfer of beneficial characteristics from more drought-tolerant to more drought-sensitive species. A prerequisite for such analyses is the robust identification of orthologous genes and their associated regulatory regions across the divergent genomes of these three species.

This remains challenging due to extensive structural variation, transposable element activity, lineage-specific rearrangements, gene family expansions, and past whole-genome duplications, which complicate both orthology inference and comparative analysis of regulatory sequences. To address these challenges, we developed a multi-resolution comparative genomics framework that integrates sequence similarity, phylogenetic relationships, gene collinearity, and whole-genome alignment across maize, teff, and E. nindensis.

The workflow combines complementary approaches, including orthogroup inference (OrthoFinder), gene synteny detection (MCScanX), whole-genome alignment (Cactus), and sequence similarity–based methods (BLASTN and BLASTP). Publicly available reference genomes and gene annotations were used to extract coding sequences, proteins, and genes together with their flanking regulatory regions for pairwise and cross-species analyses.

By integrating homology-based, phylogenetic, collinearity-aware, and base-resolution alignment evidence, the framework improves the classification of orthologous relationships and supports the delineation and cross-species mapping of candidate regulatory regions, including in genomic contexts with disrupted collinearity. This approach provides a transferable computational foundation for comparative regulatory genomics in divergent grasses and establishes a basis for downstream integration with trait-focused datasets, including abiotic and biotic stress-related data. In the context of BOOSTER, this framework provides a foundation for further comparative investigation of experimentally identified drought-related genes and their associated candidate regulatory regions across the three species, with relevance to drought tolerance improvement in maize and teff.

Keywords: comparative genomics, gene orthology, synteny

Acknowledgement: This work was supported by the project “Boosting drought tolerance in key cereals in the era of climate change (BOOSTER)”. The BOOSTER project has received funding from the European Union's Horizon Europe Research and Innovation Programme under Grant Agreement No. 101081770.