IsoChecker: multisample analysis and interactive browsing of long-read transcript isoforms

Alla Mikheenko1* and Andrey Przhibelskiy2

1Department of Computational Biology, Mohamed bin Zayed University of Artificial Intelligence

2University of Helsinki

alla.mikheenko [at] mbzuai.ac.ae

Abstract

Long-read RNA sequencing routinely generates full-length transcript catalogues for each sample of an experiment. Combining these catalogues across samples and conditions presents two major challenges: constructing a non-redundant consensus annotation and identifying which novel transcripts are reliable enough for downstream analysis. Expression alone is not an optimal filter, as experimentally validated isoforms can show both low abundance and low cross-sample reproducibility. Therefore, robust isoform prioritization requires integrating multiple sources of evidence.

We developed IsoChecker, a pipeline combined with an interactive browser for multisample long-read transcriptome analysis. IsoChecker takes per-sample GTF files and abundance estimates, merges them into a consensus annotation, and matches novel transcripts across samples using a configurable splice-junction tolerance. Each consensus transcript is assigned a structural category and a composite reliability score that integrates transcript category, detection frequency, abundance, junction support, and canonical splice-site usage. When available, IsoChecker can additionally incorporate orthogonal evidence such as short-read splice-junction support and CAGE/QuantSeq end peaks, enabling more confident prioritization of biologically relevant isoforms.

Beyond transcript scoring, IsoChecker provides a broad overview of reproducibility across samples and condition-specific transcriptomic changes. The framework evaluates detection consistency across replicates; TSS and TTS support relative to reference annotation, transcript- and gene-level structural complexity, and sample similarity through heatmaps. Differential isoform usage and differential splicing are assessed per gene using Mann-Whitney statistics.

Results are presented through an interactive isoform-focused gene browser designed to simplify interpretation of genes with complex isoform structure. The browser visualizes all isoforms of a gene together with per-condition read coverage, supports within-condition normalization for comparisons across libraries of different depth, and highlights shifts in isoform composition between conditions. IsoChecker also evaluates transcript stability across replicates, measures concordance with alternative long-read isoform discovery tools, and generates ranked candidate lists of novel isoforms for downstream validation. Overall, IsoChecker provides both a practical reliability-scoring framework and an interactive environment for exploring and prioritizing isoforms in multisample long-read transcriptome studies.

Keywords: transcriptomics, long-reads, genome browser