Andrey Przhibelskiy1*, Lieke Michielsen2, Careen Foord2, Iman Hajirasouliha3, Hagen U. Tilgner2 and Alexandru I. Tomescu4
1University of Helsinki
2Feil Family Brain and Mind Research Institute, Weill Cornell Medicine
3Institute for Computational Biomedicine, Department of Physiology and Biophysics, Weill Cornell Medicine of Cornell University
4Department of Computer Science, University of Helsinki,
andrey.przhibelskiy [at] helsinki.fi
Abstract
Recent advances in single-cell and spatial transcriptomics combined with long-read sequencing have opened new possibilities for isoform-level profiling and studying alternative splicing. However, upstream processing of such data poses distinct algorithmic challenges, including detecting barcodes in error-prone reads using extremely large whitelists, resolving molecule concatenation artifacts and PCR duplicates, accurately quantifying and discovering complex isoforms, as well as a lack of methods for detecting spatially variable isoforms.
Here we present Spl-IsoQuant — a universal toolkit for processing long-read RNA-seq data obtained with various single-cell and spatial protocols. Our software extends the IsoQuant framework with high-precision barcode calling that also handles cDNA concatenation typical for some protocols and ONT sequencing. Spl-IsoQuant performs UMI-based PCR deduplication and accurately quantifies genes and isoforms in individual cells, cell types, and user-defined conditions. It supports multiple protocols, such as 10x single-cell, 10x Visium and Visium HD, Stereo-seq and Curio. Moreover, it can process long-read data from any custom protocol via a user-specified molecule description format. We complement the toolkit with Spl-IsoFind, which discovers spatially variable isoforms using Moran’s I spatial autocorrelation. Combined with region-wise isoform tests, this framework detects both known patterns and signals not aligned with predefined anatomical regions.
Applying our toolkit to several mouse brain spatial long-read Stereo-seq and 10x Visium HD datasets with 500nm vs 2μm resolution respectively, we demonstrate spatially variable isoform discovery at single-cell level and reproducibility across protocols, establishing a general-purpose pipeline for single-cell and spatial long-read RNA-seq analysis.
Keywords: transcriptomics, long reads, single-cell, spatial

