PCR-based nanopore sequencing method for characterizing short tandem repeat expansions

Lana Radenković1*, Jovan Pešović1, Vladimir Tomić2, Igor Davidović1, Nemanja Radovanović1, Ana Popic2 and Dusanka Savić-Pavićević1

1 University of Belgrade – Faculty of Biology, Belgrade, Serbia

2 Centogene, Belgrade, Serbia

lana.radenkovic [at] bio.bg.ac.rs

Abstract

Short tandem repeat (STR) expansions cause >60 rare neurological diseases and pose a methodological challenge due to their length (0.15->3 kb), complex sequence structure, and a continuous increase in length in somatic cells during a patient’s life. These genetic features are the main source of extreme variability in the presentation and progression of the diseases. To make genetic diagnosis more accessible and improve prognosis in individual patients, we are developing a nanopore sequencing method to characterize STR expansions deeply.

The method is based on PCR enrichment and a bioinformatics pipeline developed in-house. The library containing up to six patient samples was prepared using Native Barcoding Kit 24 V14. Sequencing was performed on R10.4.1 Flongle cells on a portable Mk1C device (Oxford Nanopore Technologies). Raw data were basecalled with Guppy. Reads were analyzed as complex strings with an STR between predefined locus-specific flanking sequences. Regular expressions were used to determine the length and structure of the STRs. Reads were categorized based on the STR length and aligned without reference. Polishing steps for overcoming per-read errors were based on results from our wet lab experiments.

The length and structure of STR expansion in patients with spinocerebellar ataxia type 8 (e.g., (CTA)6(CTG)56-60CCG(CTG)53-56) and myotonic dystrophy type 1 (DM1) (e.g., (CTG)350-700(CCGCTG)3(CTG)4(CCGCTG)2CTGCCG(CTG)18) were accurately determined compared to orthogonal methods. For somatically highly unstable DM1 mutation, a reliable distribution of allele length was achieved with at least 200X coverage, allowing accurate estimation of modal allele size and degree of somatic instability.

Our nanopore sequencing method can reliably characterize clinically relevant features of STR expansions: length, sequence structure and degree of somatic instability. The method outperforms gold standard methods (repeat-primed PCR, Southern blot), captures somatic variability more reliably compared to Cas9-based enrichment and is more accessible for clinical settings compared to PacBio and Illumina (where applicable) sequencing methods.

Keywords: STR expansions, nanopore sequencing, regular expressions, long-read sequencing

Acknowledgement: This research was supported by the Science Fund of the Republic of Serbia, Grant number 7754217, Understanding repeat expansion dynamics and phenotype variability in myotonic dystrophy type 1 through human studies, nanopore sequencing and cell models – READ-DM1