Nextflow and nf-core: a virtual infrastructure for the next generation of bioinformatics pipelines

Cedric Notredame^*

Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), Abu Dhabi, United Arab Emirates

cedric.notredame [at] mbzuai.ac.ae

Abstract

Biology generates unprecedented volumes of data, pushing computational infrastructures beyond present capacities. Turning this raw material into knowledge requires workflows that are reproducible, portable, and community-owned. Nextflow, together with the nf-core initiative, has become the de facto framework for this purpose, providing a virtual infrastructure on which thousands of groups now build, share, and run their analyses.

In this talk, I will first retrace how Nextflow and nf-core evolved into a cooperation platform for bioinformatics, as described in our recent Genome Biology paper (Langer et al., 2025). The strength of the framework lies less in its workflow engine than in the community it has brokered: a distributed network of scientists coordinating their efforts through shared tooling, common standards, and joint governance. A central development has been the move to DSL2 and the emergence of an extensive library of modules and subworkflows. This granular layer has fundamentally changed how cooperation takes place — contributors no longer exchange full workflows but reusable building blocks, which are independently tested, versioned, and maintained across institutions. The result is a progressive convergence toward FAIR workflows, adopted by consortia as diverse as EuroFAANG, Darwin Tree of Life, and Genomics England.

This modular foundation naturally called for the next step: a hub. Pipeline hubs turn workflows into versioned, peer-reviewed scientific objects, but their defining contribution is to make alternative ways of doing the same thing directly confrontable — running side by side, on the same data, under the same conditions. I will illustrate the concept with nf-core/multiplesequencealign (Santus et al., 2025), which systematically confronts alignment strategies against structural and evolutionary references.

Pipeline hubs are also, in my view, the natural substrate on which AI will operate in biology. Because these workflows are standardized, ontology-ready, and continuously benchmarked, they can be coordinated, extended, and eventually co-developed with AI agents — opening the way to pipelines adaptable to shifting scientific questions and self-aware enough to follow targeted objectives semi-autonomously.

Keywords: Nextflow, nf-core, MSA, FAIR

Acknowledgement: The author thanks the nf-core community for its collective contribution to the infrastructure described in this talk, and MBZUAI for supporting his participation in BelBi.

2026, Belgrade

Usefull Links

Contact Us