b56a0322ab
Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> |
||
---|---|---|
.github/workflows | ||
deprecated | ||
docs/images | ||
lib | ||
software | ||
tests | ||
.editorconfig | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.markdownlint.yml | ||
LICENSE | ||
README.md | ||
test_import.nf |
THIS REPOSITORY IS UNDER ACTIVE DEVELOPMENT. SYNTAX, ORGANISATION AND LAYOUT MAY CHANGE WITHOUT NOTICE!
A repository for hosting Nextflow DSL2 module files containing tool-specific process definitions and their associated documentation.
Table of contents
Using existing modules
The module files hosted in this repository define a set of processes for software tools such as fastqc
, bwa
, samtools
etc. This allows you to share and add common functionality across multiple pipelines in a modular fashion.
We have written a helper command in the nf-core/tools
package that uses the GitHub API to obtain the relevant information for the module files present in the software/
directory of this repository. This includes using git
commit hashes to track changes for reproducibility purposes, and to download and install all of the relevant module files.
-
Install the latest version of
nf-core/tools
(>=1.10.2
) -
List the available modules:
$ nf-core modules list ,--./,-. ___ __ __ __ ___ /,-._.--~\ |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/tools version 1.10.2 INFO Modules available from nf-core/modules (master): modules.py:51 bwa/index bwa/mem deeptools/computematrix deeptools/plotfingerprint deeptools/plotheatmap deeptools/plotprofile fastqc ..truncated..
-
Install the module in your pipeline directory:
$ nf-core modules install . fastqc ,--./,-. ___ __ __ __ ___ /,-._.--~\ |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/tools version 1.10.2 INFO Installing fastqc modules.py:62 INFO Downloaded 3 files to ./modules/nf-core/software/fastqc modules.py:97
-
Import the module in your Nextflow script:
#!/usr/bin/env nextflow nextflow.enable.dsl = 2 include { FASTQC } from './modules/nf-core/software/fastqc/main' addParams( options: [:] )
-
We have plans to add other utility commands to help developers install and maintain modules downloaded from this repository so watch this space!
$ nf-core modules --help ...truncated... Commands: list List available software modules. install Add a DSL2 software wrapper module to a pipeline. update Update one or all software wrapper modules. (NOT YET IMPLEMENTED) remove Remove a software wrapper from a pipeline. (NOT YET IMPLEMENTED) check Check that imported module code has not been modified. (NOT YET IMPLEMENTED)
Adding a new module file
NB: The definition and standards for module files are still under discussion but we are now gladly accepting submissions :)
If you decide to upload a module to nf-core/modules
then this will
ensure that it will become available to all nf-core pipelines,
and to everyone within the Nextflow community! See
software/
for examples.
Module template
We have added a directory called software/SOFTWARE/TOOL/
that serves as a template with which to create your own module submission. Where applicable, we have added extensive TODO
statements to the files in this directory for general information, to help guide you as to where to make the appropriate changes, and how to make them. If in doubt, have a look at how we have done things for other modules.
.
├── software
│ ├── SOFTWARE
│ │ └── TOOL
│ │ ├── functions.nf ## Utility functions imported in main module script
│ │ ├── main.nf ## Main module script
│ │ ├── meta.yml ## Documentation for module, input, output, params, author
│ │ └── test
│ │ ├── input ## Soft-link input test data from "tests/"
│ │ ├── main.nf ## Minimal workflow to test module
│ │ ├── nextflow.config ## Minimal config to test module
│ │ └── output ## Upload output files from test for unit testing
Guidelines
The key words "MUST", "MUST NOT", "SHOULD", etc. are to be interpreted as described in RFC 2119.
General
-
Software that can be piped together SHOULD be added to separate module files unless there is a run-time, storage advantage in implementing in this way. For example, using a combination of
bwa
andsamtools
to output a BAM file instead of a SAM file:bwa mem | samtools view -B -T ref.fasta
-
Where applicable, the usage and generation of compressed files SHOULD be enforced as input and output, respectively:
*.fastq.gz
and NOT*.fastq
*.bam
and NOT*.sam
-
Where applicable, each module command MUST emit a file
<SOFTWARE>.version.txt
containing a single line with the software's version in the format<VERSION_NUMBER>
or0.7.17
e.g.echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//' > ${software}.version.txt
If the software is unable to output a version number on the command-line then a variable called
VERSION
can be manually specified to create this file e.g. homer/annotatepeaks module. -
The process definition MUST NOT contain a
when
statement.
Naming conventions
-
The directory structure for the module name must be all lowercase e.g.
software/bwa/mem/
. The name of the software (i.e.bwa
) and tool (i.e.mem
) MUST be all one word. -
The process name in the module file MUST be all uppercase e.g.
process BWA_MEM {
. The name of the software (i.e.BWA
) and tool (i.e.MEM
) MUST be all one word separated by an underscore. -
All parameter names MUST follow the
snake_case
convention. -
All function names MUST follow the
camelCase
convention.
Module parameters
-
A module file SHOULD only define input and output files as command-line parameters to be executed within the process.
-
All other parameters MUST be provided as a string i.e.
options.args
whereoptions
is a Groovy Map that MUST be provided via the NextflowaddParams
option when including the module viainclude
in the parent workflow. -
If the tool supports multi-threading then you MUST provide the appropriate parameter using the Nextflow
task
variable e.g.--threads $task.cpus
. -
Any parameters that need to be evaluated in the context of a particular sample e.g. single-end/paired-end data MUST also be defined within the process.
Input/output options
-
Named file extensions MUST be emitted for ALL output channels e.g.
path "*.txt", emit: txt
. -
Optional inputs are not currently supported by Nextflow. However, "fake files" MAY be used to work around this issue.
Resource requirements
-
An appropriate resource
label
MUST be provided for the module as listed in the nf-core pipeline template e.g.process_low
,process_medium
orprocess_high
. -
If the tool supports multi-threading then you MUST provide the appropriate parameter using the Nextflow
task
variable e.g.--threads $task.cpus
.
Software requirements
BioContainers is a registry of Docker and Singularity containers automatically created from all of the software packages on Bioconda. Where possible we will use BioContainers to fetch pre-built software containers and Bioconda to install software using Conda.
-
Software requirements SHOULD be declared within the module file using the Nextflow
container
directive e.g. go to the BWA BioContainers webpage, click on thePacakages and Containers
tab, sort byVersion
and get the portion of the link after thedocker pull
command whereType
is Docker. You may need to double-check that you are using the latest version of the software because you may find that containers for older versions have been rebuilt more recently. -
If the software is available on Conda it MUST also be defined using the Nextflow
conda
directive. Software MUST be pinned to the channel (i.e.bioconda
) and version (i.e.0.7.17
) e.g.bioconda::bwa=0.7.17
. Pinning the build too is not currently a requirement e.g.bioconda::bwa=0.7.17=h9402c20_2
. -
If required, multi-tool containers may also be available on BioContainers e.g.
bwa
andsamtools
. It is also possible for a multi-tool container to be built and added to BioContainers by submitting a pull request on theirmulti-package-containers
repository. -
If the software is not available on Bioconda a
Dockerfile
MUST be provided within the module directory. We will use GitHub Actions to auto-build the containers on the GitHub Packages registry.
Publishing results
The Nextflow publishDir
definition is currently quite limited in terms of parameter/option evaluation. To overcome this, the publishing logic we have implemented for use with DSL2 modules attempts to minimise changing the publishDir
directive (default: params.outdir
) in favour of constructing and appending the appropriate output directory paths via the saveAs:
statement e.g.
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:meta.id) }
The saveFiles
function can be found in the functions.nf
file of utility functions that will be copied into all module directories. It uses the various publishing options
specified as input to the module to construct and append the relevant output path to params.outdir
.
We also use a standardised parameter called params.publish_dir_mode
that can be used to alter the file publishing method (default: copy
).
Testing
-
All test data for
nf-core/modules
MUST be added totests/data/
and organised by filename extension. -
Test files MUST be kept as tiny as possible.
-
Every module MUST be tested by adding a test workflow with a toy dataset in the
test/
directory of the module. -
Generic files from
tests/data/
MUST be reused by symlinking them into thetest/input/
directory of the module. -
Any outputs produced by the test workflow MUST be placed in a folder called
test/output/
so that they can be used for unit testing. -
If the appropriate test data doesn't exist for your module then it MUST be added to
tests/data/
. -
A GitHub Actions workflow file MUST be added to
.github/workflows/
e.g..github/workflows/fastqc.yml
.
Documentation
- A module MUST be documented in the
meta.yml
file. It MUST documentparams
,input
andoutput
.input
andoutput
MUST be a nested list.
Uploading to nf-core/modules
Fork the nf-core/modules
repository to your own GitHub account. Within the local clone of your fork add the module file to the software/
directory.
Commit and push these changes to your local clone on GitHub, and then create a pull request on the nf-core/modules
GitHub repo with the appropriate information.
We will be notified automatically when you have created your pull request, and providing that everything adheres to nf-core guidelines we will endeavour to approve your pull request as soon as possible.
Terminology
The features offered by Nextflow DSL2 can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology we have decided to use when referring to DSL2 components:
-
Module: A
process
that can be used within different pipelines and is as atomic as possible i.e. cannot be split into another module. An example of this would be a module file containing the process definition for a single tool such asFastQC
. At present, this repository has been created to only host atomic module files that should be added to thesoftware/
directory along with the required documentation and tests. -
Sub-workflow: A chain of multiple modules that offer a higher-level of functionality within the context of a pipeline. For example, a sub-workflow to run multiple QC tools with FastQ files as input. Sub-workflows should be shipped with the pipeline implementation and if required they should be shared amongst different pipelines directly from there. As it stands, this repository will not host sub-workflows although this may change in the future since well-written sub-workflows will be the most powerful aspect of DSL2.
-
Workflow: What DSL1 users would consider an end-to-end pipeline. For example, from one or more inputs to a series of outputs. This can either be implemented using a large monolithic script as with DSL1, or by using a combination of DSL2 individual modules and sub-workflows.
Help
For further information or help, don't hesitate to get in touch on Slack #modules
channel (you can join with this invite).
Citation
If you use the module files in this repository for your analysis please you can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. ReadCube: Full Access Link