nf-core_modules

mirror of https://github.com/MillironX/nf-core_modules.git synced 2024-12-22 11:08:17 +00:00

Repository to host tool-specific module files for the Nextflow DSL2 community!

Find a file

drpatelh 21efc24299 Re-write requirements		2020-08-06 14:50:09 +01:00
.github/workflows	Update badge	2020-08-06 12:42:45 +01:00
deprecated	Fix markdownlint	2020-08-05 17:46:23 +01:00
docs/images	Move social preview image	2020-07-11 14:28:58 +02:00
lib	Fix code format linting	2020-07-23 11:43:48 +02:00
software	Update main script	2020-08-06 12:00:38 +01:00
tests	Merge pull request #42 from JoseEspinosa/bedtools_dev	2020-08-05 15:45:17 +01:00
.editorconfig	Update .editorconfig	2020-06-15 16:47:42 +02:00
.gitattributes	Fill out repo	2019-07-26 10:19:07 +01:00
.gitignore	Add tests for fastqc module	2020-07-15 09:48:14 +02:00
.gitmodules	Fix editor config styles	2020-07-11 13:42:13 +02:00
.markdownlint.yml	Fix markdownlint	2020-07-14 12:21:33 +02:00
LICENSE	Fill out repo	2019-07-26 10:19:07 +01:00
README.md	Re-write requirements	2020-08-06 14:50:09 +01:00
test_import.nf	Added empty test_import.nf file	2019-12-06 10:38:57 +01:00

README.md

THIS REPOSITORY IS UNDER ACTIVE DEVELOPMENT. SYNTAX, ORGANISATION AND LAYOUT MAY CHANGE WITHOUT NOTICE!

A repository for hosting Nextflow DSL2 module files containing tool-specific process definitions and their associated documentation.

Using existing modules
Adding a new module file
Terminology
Help
Citation

Using existing modules

The module files hosted in this repository define a set of processes for software tools such as fastqc, bwa, samtools etc. This allows you to share and add common functionality across multiple pipelines in a modular fashion.

We have written a helper command in the nf-core/tools package that uses the GitHub API to obtain the relevant information for the module files present in the software/ directory of this repository. This includes using git commit hashes to track changes for reproducibility purposes, and to download and install all of the relevant module files.

Install the latest version of nf-core/tools (>=1.10.2)

List the available modules:

$ nf-core modules list

                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'

    nf-core/tools version 1.10.2



INFO      Modules available from nf-core/modules (master):                                                                                                                  modules.py:51

bwa/index
bwa/mem
deeptools/computematrix
deeptools/plotfingerprint
deeptools/plotheatmap
deeptools/plotprofile
fastqc
..truncated..

Install the module in your pipeline directory:

$ nf-core modules install . fastqc

                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'

    nf-core/tools version 1.10.2



INFO      Installing fastqc                                                                                                                                                 modules.py:62
INFO      Downloaded 3 files to ./modules/nf-core/software/fastqc                                                                                                           modules.py:97

Import the module in your Nextflow script:

#!/usr/bin/env nextflow

nextflow.enable.dsl = 2

include { FASTQC } from './modules/nf-core/software/fastqc/main'

We have plans to add other utility commands to help developers install and maintain modules downloaded from this repository so watch this space!

$ nf-core modules --help

...truncated...

Commands:
  list     List available software modules.
  install  Add a DSL2 software wrapper module to a pipeline.
  update   Update one or all software wrapper modules.             (NOT YET IMPLEMENTED)
  remove   Remove a software wrapper from a pipeline.              (NOT YET IMPLEMENTED)
  check    Check that imported module code has not been modified.  (NOT YET IMPLEMENTED)

Adding a new module file

NB: The definition and standards for module files are still under discussion amongst the nf-core community but your contributions are always more than welcome! :)

If you decide to upload a module to nf-core/modules then this will ensure that it will become available to all nf-core pipelines, and to everyone within the Nextflow community! See nf-core/modules/software for examples.

Current guidelines

The key words "MUST", "MUST NOT", "SHOULD", etc. are to be interpreted as described in RFC 2119.

General

Software that can be piped together SHOULD be added to separate module files unless there is a run-time, storage advantage in implementing in this way. For example, using a combination of bwa and samtools to output a BAM file instead of a SAM file:
```
bwa mem | samtools view -B -T ref.fasta
```
Where applicable, the usage/generation of compressed files SHOULD be enforced as input/output e.g. *.fastq.gz and NOT *.fastq, *.bam and NOT *.sam etc.
Where applicable, a command MUST be provided to obtain the version number of the software used in the module e.g.
```
echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//' > ${software}.version.txt
```

If the software is unable to output a version number on the command-line then it can be manually specified e.g. homer/annotatepeaks module.

Naming conventions

The directory structure for the module name must be all lowercase e.g. software/bwa/mem/. The name of the software (i.e. bwa) and tool (i.e. mem) MUST be all one word.
The process name in the module file MUST be all uppercase e.g. process BWA_MEM {. The name of the software (i.e. BWA) and tool (i.e. MEM) MUST be all one word separated by an underscore.
All parameter names MUST follow the snake_case convention.
All function names MUST follow the camelCase convention.

Module parameters

A module file SHOULD only define input and output files as command-line parameters to be executed within the process.
All other parameters MUST be provided as a string i.e. options.args where options is a Groovy Map that MUST be provided in the input section of the process.
If the tool supports multi-threading then you MUST provide the appropriate parameter using the Nextflow task variable e.g. --threads $task.cpus.
Any parameters that need to be evaluated in the context of a particular sample e.g. single-end/paired-end data MUST also be defined within the process.

Input/output options

Named file extensions MUST be emitted for ALL output channels e.g. path "*.txt", emit: txt.
Optional inputs are not currently supported by Nextflow. However, "fake files" MAY be used to work around this issue.

Module software

Fetch "docker pull" address for latest Biocontainer image of software: e.g. https://biocontainers.pro/#/tools/samtools.
If required, multi-tool containers may also be available and are usually named to start with "mulled".
List required Conda packages. Software MUST be pinned to channel (i.e. "bioconda") and version (i.e. "1.10") as in the example below. Pinning the build too e.g. "bioconda::samtools=1.10=h9402c20_2" is not currently a requirement.

Resource requirements

Provide appropriate resource label for process as listed in the nf-core pipeline template
If the tool supports multi-threading then you MUST provide the appropriate parameter using the Nextflow task variable e.g. --threads $task.cpus.

Defining inputs, outputs and parameters

A module file SHOULD only define inputs and outputs as parameters. Additionally,
- it MUST define threads or resources where required for a particular process using task.cpus
- ~~it MUST be possible to pass additional parameters to the tool as a command line string via the params.<MODULE>_args parameter.~~
- it MUST be possible to pass additional parameters as a nextflow Map through an additional input channel val(options) [Details require discussion].
- All NGS modules MUST accept a triplet [name, single_end, reads] as input. The single-end boolean values MUST be specified through the input channel and not inferred from the data e.g. here.
Process names MUST be all uppercase.
Each process MUST emit a file <TOOL>.version.txt containing a single line with the software's version in the format v<VERSION_NUMBER>.
All outputs MUST be named using emit.
A Process MUST NOT contain a when statement.
Optional inputs need development on the nextflow side. In the meanwhile, "fake files" MAY be used to work around this issue.

Atomicity

Software that can be piped together SHOULD be added to separate module files unless there is an run-time, storage advantage in implementing in this way e.g. bwa mem | samtools view -C -T ref.fasta to output CRAM instead of SAM.

Resource requirements

Each module MUST define a label process_low, process_medium or process_high to declare resource requirements. (These flags will be ignored outside of nf-core and the pipeline developer is free to define adequate resource requirements)

Publishing results

The module MUST accept the parameters params.out_dir and params.publish_dir and MUST publish results into ${params.out_dir}/${params.publish_dir}.
The publishDirMode MUST be configurable via params.publish_dir_mode
The module MUST accept a parameter params.publish_results accepting at least
- "none", to publish no files at all,
- a glob pattern which is initalized to a sensible default value.
It MAY accept "logs" to publish relevant log files, or other flags, if applicable.
To ensure consistent naming, files SHOULD be renamed according to the $name variable before returning them.

Testing

Every module MUST be tested by adding a test workflow with a toy dataset.
Test data MUST be stored within this repo. It is RECOMMENDED to re-use generic files from tests/data by symlinking them into the test directory of the module. Specific files MUST be added to the test-directory directly. Test files MUST be kept as tiny as possible.

Software requirements

Software requirements SHOULD be declared in a conda environment.yml file, including exact version numbers. Additionally, there MUST be a Dockerfile that containerizes the environment, or packages the software if conda is not available.
Docker containers MUST BE identified by their sha256(Dockerfile + environment.yml).
Each module must have it's own Dockerfile and environment.yml file
- Care should be taken to maintain identical files for subcommands that use the same software. Then the hash tag will be the same and they will be implicitly re-used across subcommands.

File formats

Wherever possible, CRAM files SHOULD be used over BAM files.
Wherever possible, FASTQ files SHOULD be compressed using gzip.

Documentation

A module MUST be documented in the meta.yml file. It MUST document params, input and output. input and output MUST be a nested list. [Exact detail need to be elaborated. ]

Uploading to `nf-core/modules`

Fork the nf-core/modules repository to your own GitHub account. Within the local clone of your fork add the module file to the nf-core/modules/software directory.

Commit and push these changes to your local clone on GitHub, and then create a pull request on the nf-core/modules GitHub repo with the appropriate information.

We will be notified automatically when you have created your pull request, and providing that everything adheres to nf-core guidelines we will endeavour to approve your pull request as soon as possible.

Terminology

The features offered by Nextflow DSL2 can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology we have decided to use when referring to DSL2 components:

Module: A process that can be used within different pipelines and is as atomic as possible i.e. cannot be split into another module. An example of this would be a module file containing the process definition for a single tool such as FastQC. At present, this repository has been created to only host atomic module files that should be added to the software/ directory along with the required documentation and tests.
Sub-workflow: A chain of multiple modules that offer a higher-level of functionality within the context of a pipeline. For example, a sub-workflow to run multiple QC tools with FastQ files as input. Sub-workflows should be shipped with the pipeline implementation and if required they should be shared amongst different pipelines directly from there. As it stands, this repository will not host sub-workflows although this may change in the future since well-written sub-workflows will be the most powerful aspect of DSL2.
Workflow: What DSL1 users would consider an end-to-end pipeline. For example, from one or more inputs to a series of outputs. This can either be implemented using a large monolithic script as with DSL1, or by using a combination of DSL2 individual modules and sub-workflows.

Help

For further information or help, don't hesitate to get in touch on Slack #modules channel (you can join with this invite).

Citation

If you use the module files in this repository for your analysis please you can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. ReadCube: Full Access Link

README.md

Table of contents

Using existing modules

Adding a new module file

Current guidelines

General

Naming conventions

Module parameters

Input/output options

Module software

Resource requirements

Defining inputs, outputs and parameters

Atomicity

Resource requirements

Publishing results

Testing

Software requirements

File formats

Documentation

Uploading to nf-core/modules

Terminology

Help

Citation

Uploading to `nf-core/modules`