STIM: Multipurpose method for spatial transcriptomics data integration across different technologies

Milos M. Radonjic1*, Aleksandra Stanojevic1, Tamara Banovac1, Fang Shuangsang2,3 and Junhua Li1,3

1 BGI Research, Belgrade 11000, Serbia

2 BGI Research, Beijing 102601, China

3 BGI Research, Shenzhen 518083, China

milosradonjic1 [at] genomics.cn

Abstract

We developed an innovative, statistically based data integration method specifically tailored for spatial transcriptomics data. Our method successfully performed all data integration tasks, while removing batch effects by correcting the entire gene expression matrix, ensuring superior preservation of biological information. The outstanding preservation of biological information is significantly enhanced by employing piece-wise affine transformations for aligning gene expression distributions across samples. Our technique robustly demonstrated exceptional batch-effects correction performances across various experimental technologies, datasets, and integration tasks by outperforming all existing methods, especially in preservation of biological information.

As an integral feature, we developed an entirely new spatially aware clustering method capable of accurate identification of tissues and spatial domains. Together with a novel cross-sample clusters mapping methodology, the method ensured robust cross-sample clustering applicable to spatial domains clusters, as well as cell-type clusters. Due to all these features, our method demonstrated remarkable versatility, enabling batch-effects-free integration of multiple samples, 3D clustering, and the seamless incorporation of healthy and diseased samples.

Moreover, our method is the one and only that directly corrects a gene expression matrix by applying transformations which keep the gene regulatory information preserved, allowing studying of gene regulations and gene co-expressions after data integration. This makes our method unique and the only choice for any downstream analysis task which requires full gene expression matrix as an input.

Keywords: bioinformatics, spatial transcriptomics, data integration, batch effects