Exploring Genetic Variant Selection Algorithms for Enhanced Genotyping Assays in Personalized Medicine

Katarina Kruščić1, Ivan Životić2, Maja Živković2 and Tamara Đurić2

1 Faculty of Biology, University of Belgrade, Belgrade, Serbia

2 Institute of Nuclear Sciences “Vinča”, National Institute of the Republic of Serbia, Laboratory for Radiobiology and Molecular Genetics, University of Belgrade, Belgrade, Serbia

katarina.kruscic [at] bio.bg.ac.rs

Abstract

The integration of complex genetic information into healthcare systems is vital for advancing personalized medicine. Linkage disequilibrium (LD) facilitates obtaining significant information with fewer genetic variants designated as tag variants. Accurate tag variant selection enhances assay efficiency by maximizing information yield with fewer markers. Previous algorithms based on haplotype blocks prompted the development of newer algorithms like BigLD, offering improved LD estimation. However, better recombination estimation methods are needed due to incomplete alignment with recombination sites. Expanding search regions by ±100kb around gene boundaries reveals regulatory elements that impact gene expression. This study aims to compare LD-Select and LmTag algorithms, focusing on imputation coverage and functional aspects. Research was conducted on the two genes (Gclm and Nqo1) from the Nrf2/ARE signaling pathway, important in inflammatory conditions. LmTag, considering minor allele frequency (MAF), LD strength (r2), and genomic distance, aims to enhance performance and functional relevance compared to LD-Select. Results indicate that while increasing bin width enhances cumulative scores, it reduces imputation coverage. Optimal bin widths were determined for specific genes: k=50 for Gclm gene, and k=10 for Nqo1 gene. Extending gene regions by 100kb improves imputation coverage using both algorithms. For instance, for the Gclm gene and its promoters, LD-Select suggests 5 tag variants covering 97.30% of the region. Expanding the observation region reveals 6 tag variants with 100% coverage, while LmTag selects 5 variants covering 94.59%. For the Nqo1 gene, LD-Select suggests 3 tag variants covering 95% of the region, with LmTag offering 3 variants with similar coverage. This study underscores the importance of selecting representative variants and offers insights into algorithmic development, with potential implications for personalized medicine in diverse phenotypic conditions. Further experimental validation is essential to corroborate these findings and enhance their clinical utility.

Keywords: tag variants, linkage disequilibrium, LD-Select, LmTag, BigLD

Acknowledgement: This research was supported by the Science Fund of the Republic of Serbia, grant number #Grant no. 7753406, Identification and functional characterization of extracellular and intracellular genetic regulators of ferroptosis related processes in multiple sclerosis – FerroReg.