Integration of bioinformatics data for crop plant breeding

Yuriy L. Orlov1,2*, Haoyu Chao3, Shilong Zhang3, Vladimir A. Ivanisenko2, Ming Chen3

1 Sechenov First Moscow State Medical University of the Russian Ministry of Health (Sechenov University), Moscow, Russia

2 Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk , Russia

3 Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China

orlov [at] d-health.institute

Abstract

In agrobiology the crop plant breeding involves selecting new plant varieties with desirable traits such as increased yield, improved disease resistance, and enhanced nutritional value. In recent years, the emergence of high-throughput omics technologies has revolutionized crop plant breeding by providing vast amounts of data on the molecular mechanisms underlying plant development, and responses to environmental stresses. However, to effectively use these technologies, integration of multi-omics data from different databases is required. Another important aspect of biotechnology is the growth of AI and Machine Learning applications. Integration of omics data provides a comprehensive understanding of the biological processes underlying plant traits and their interactions. The number of sequenced crop genomes has continued to rapidly grow in recent years, providing valuable resources for agricultural research. Additionally, epigenomics and transcriptomics have become increasingly important in crop breeding, providing insights into gene regulation and aiding in the identification of desirable traits. The SRA database has seen a continuous increase in epigenomic and transcriptomic data, further emphasizing the significance of these fields for crop breeding. Proteomics and metabolomics have continued to develop in crop breeding, allowing for a deeper understanding of plant molecular mechanisms. Several omics databases have been developed to store and analyze large-scale omics data for different crop species, including rice, maize, wheat, and soybean. These databases provide a wealth of information on the genetic makeup, epigenome regulation, gene expression profiles, protein functions, and metabolic pathways of crops, which can be used to improve breeding programs. The use of genomic databases such as NCBI Assembly, Genome Warehouse, EnsemblPlants, Phytozome, and PlantGDB provides access to genome sequences, gene annotations, and functional annotations for many crop species, including rice, maize, soybean, wheat. We highlights the importance of integrating omics databases in crop plant breeding, discusses available omics data and databases, describes integration challenges, and highlights recent developments and potential benefits. Taken together, the integration of omics databases is a critical step towards enhancing crop plant breeding.

Keywords: Omics, Data Integration, Databases, Plant biology, Crop plant breeding