Metaplasia as a transitional cell-of-origin state revealed by epigenome-informed cancer prediction

Mohamad D. Bairakdar1, Wooseung Lee1, Bruno Giotti1, Akhil Kumar1, Paula Stancl2*, Elvin Wagenblast3, Dolores Hambardzumyan3, Paz Polak4, Rosa Karlic2 and Alexander M. Tsankov1

1Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai (ISMMS), New York, NY, USA Lipschultz Precision Immunology Institute, ISMMS, New York, NY, USA Tisch Cancer Institute, ISMMS, New York, NY, USA

2Bioinformatics Group, Division of Molecular Biology, Department of Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia

3Tisch Cancer Institute, ISMMS, New York, NY, USA

4Haystack Oncology, Quest Diagnostics, Baltimore, MD, USA

pstancl [at] bioinfo.hr

Abstract

Understanding the cellular origin of cancer is essential for elucidating tumorigenesis and improving early detection and therapeutic strategies. Here, we leverage large-scale whole-genome sequencing and single-cell chromatin accessibility data to develop a machine learning framework for predicting the cell of origin (COO) across multiple cancer types. By integrating mutational landscapes with cell-type-specific epigenomic profiles, our approach accurately assigns tumors to their originating cellular populations at high resolution, demonstrating that somatic mutation patterns retain a record of the pre-malignant chromatin state.

Our model reveals both expected and previously unrecognized cellular origins, including a predominant basal cell origin for small cell lung cancer, challenging traditional assumptions of a neuroendocrine origin. Importantly, beyond static COO prediction, we identify dynamic cellular trajectories during tumorigenesis, highlighting the presence of intermediate metaplastic states in several gastrointestinal cancers. These findings support a model in which metaplasia represents a transitional and potentially permissive state linking normal tissue and malignant transformation, rather than a terminal differentiation endpoint.

This work demonstrates that integrative, epigenome-informed machine learning enables robust prediction of cancer cell of origin while uncovering metaplastic intermediates as critical components of tumor evolution. These insights have important implications for understanding cancer initiation, refining early detection strategies, and targeting pre-malignant cellular states.

Keywords: cell-of-origin, single-cell, cancer, genomics, metaplasia

Acknowledgement: This project was funded by the Chan Zuckerberg Initiative (CZI) Data Insights Grant 2022-249299 to P.P., R.K., and A.M.T. This work was also supported in part by the Croatian National Science Foundation Project PREDI-COO (Project number: HRZZ IP-2019-04-9308) to P.S. and R.K and through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences.