EffRank: a bioinformatics scoring pipeline for prioritizing novel type III secreted effectors in the Pseudomonas syringae species complex

Iva Rosić1,2*, Marina Sokić1,2, Tamara Ranković2,3, Olja Medić2,3, Tanja Berić1,2,3, Slaviša Stanković2,3 and Ivan Nikolić2,3

1Institute of Physics Belgrade

2Center for Pathogen Biocontrol and Plant Growth Promotion – Faculty of Biology, University of Belgrade

3Faculty of Biology, University of Belgrade

riva [at] ipb.ac.rs

Abstract

The identification of novel type III secreted effectors (T3SEs) remains challenging because these proteins exhibit strong sequence diversity and frequently lack the canonical sequence signatures used by existing prediction algorithms. As a consequence, individual machine-learning or motif-based predictors often generate large sets of candidate effectors that include numerous false positives, making downstream experimental validation inefficient. To address this limitation, we developed an integrated bioinformatics workflow designed to improve confidence in the in silico discovery of novel T3SEs and applied it to the Pseudomonas syringae species complex.

The workflow combined multiple complementary computational approaches, including predictions from Effectidor2, Bastion3, and EffectiveT3, similarity searches against curated effector databases, all-vs-all BLAST analyses, genomic context evaluation, plant subcellular localization prediction, and structural modeling. These outputs were integrated within a newly developed scoring pipeline, EffRank, which assigns quantitative scores to each effector candidate using structural, genomic, and functional criteria.

Application of this workflow to 57 complete P. syringae genomes initially produced 283 putative novel T3SE candidates. Sequential filtering steps integrating applied predictors consensus, homology exclusion, and localization analysis reduced this dataset to 15 high-confidence candidates lacking similarity to previously described effectors. EffRank analysis further prioritized three out of 15 candidates as the most promising novel T3SEs. Transcriptomic data obtained under T3SS-inducing conditions further supported the expression of orthologues corresponding to the highest-scoring candidates. In addition, the workflow identified several systematic false positives, including proteins related to YenB-like toxins, highlighting biases inherent to current prediction tools.

In conclusion, this study demonstrated that integrating multiple prediction tools with a transparent scoring framework and comparative genomics improved the reliability of novel T3SEs discovery. The proposed workflow provides a reproducible bioinformatics strategy for prioritizing candidate T3SEs and guiding targeted experimental validation.

Keywords: T3SE, Pseudomonas, novel effector prediction