PromoterLab: browser-based application for human promoter design, visualization and sequence analysis

Evgenii Lunev1 and Anastasia Matlaeva2*

1Laboratory of Modeling and Gene Therapy of Hereditary Diseases, Institute of Gene Biology Russian Academy of Sciences, Moscow, Russia

2Institute of Transport and Service, Private Educational Institution of Professional Education, Sochi, Russia

anmatlaeva [at] gmail.com

Abstract

In gene therapy and disease modeling, achieving efficient transgene expression is necessary but not sufficient. Vectors must also ensure tissue specificity, which requires carefully selected regulatory elements. In this context, natural promoters of tissue-specific genes are a logical choice for this purpose. Furthermore, when the aim is to recapitulate the expression pattern of a particular gene, using its own natural promoter may offer the most direct path to physiologically relevant expression levels. However, only a limited number of natural promoters have been experimentally characterized, and no standardized framework exists for defining the boundaries of their proximal regions suitable for cloning into expression vectors.

We developed PromoterLab, a browser-based software-as-a-service platform that integrates promoter sequence retrieval, annotation, visualization, prioritization, and comparative sequence analysis within a single workspace. The platform integrates data from multiple curated and public resources, including NCBI, EPD/EPDnew, JASPAR, ENCODE, FANTOM, UCSC Genome Browser, and Ensembl, enabling systematic exploration of proximal regions of human promoters. The platform supports analysis of variable proximal promoter regions around the transcription start site (TSS) and automated annotation of key regulatory features, including CpG islands, GC-rich regions, and transcription factor binding sites. PromoterLab incorporates algorithmic and ML/LLM-based modules that, based on a machine learning model trained on CAGE-derived transcriptional activity using sequence composition, CpG structure, and transcription factor binding features, help prioritize candidate promoter regions according to predicted transcriptional activity, tissue specificity, regulatory feature composition, and preservation of key regulatory constraints. An interactive visualization engine displays promoter maps, transcription factor binding site density, CpG island tracks, GC-content profiles, regulatory clusters, and nucleotide-level sequence views, allowing users to interactively adjust candidate promoter boundaries.

Future development of PromoterLab will include expansion of the list of supported organisms beyond humans, with particular focus on commonly used experimental models. We also plan to implement prediction of transcriptional strength and assessment of how well the selected proximal promoter region preserves tissue-specific expression relative to the endogenous gene. In addition, the platform will be extended with an automated pipeline for vector design and primer generation, enabling direct cloning into standardized or user-defined expression vectors.

Keywords: promoter design, sequence analysis

Acknowledgement: The authors would like to acknowledge Oleg Mozhey for his contributions to the development of the PromoterLab platform, including the design and implementation of the core software architecture, cloud resource provisioning, data integration, and the development of multiple analytical modules.