Nextflow is a workflow system for creating scalable, portable, and reproducible data pipelines. Although originally developed for the bioinformatics community, Nextflow is a general-purpose workflow system that can be used for any application domain. The Nextflow language uses a dataflow programming model, in which selfcontained "processes", which can be written in any programming language, are connected to each other by "channels", which define the flow of data between tasks. The Nextflow runtime is founded on four pillars: 1) compute and storage agnostic, so that the same pipeline can run on any compute infrastructure; 2) automatic and manual recovery from error conditions, which are common at scale and in cloud environments; 3) transparent support for software containers and package managers to ease deployment across different environments; 4) Git as the source of truth to track all pipeline assets and dependencies (code, configuration, software packages, containers). All together, a Nextflow pipeline is a truly complete description of a computational pipeline, which can be run nearly anywhere by nearly anyone. In this chapter, we discuss the design and implementation of Nextflow, both as a workflow language and a workflow runtime, how it differs from other workflow systems, and the value it provides for domain scientists. Finally, we describe the ecosystem of tools and communities that exist around Nextflow, including nf-core, Seqera Platform, Wave, and Fusion, which further enhance the portability and scalability of Nextflow pipelines.
Tommaso et al. (Thu,) studied this question.