Merge pull request #152 from drpatelh/docs

Update template module and add more docs for testing and finding containers
This commit is contained in:
Harshil Patel 2021-02-08 10:31:00 +00:00 committed by GitHub
commit d29ef5c52c
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
7 changed files with 166 additions and 107 deletions

66
.github/filters.yml vendored
View file

@ -2,24 +2,48 @@ bandage_image:
- software/bandage/image/**
- tests/software/bandage/image/**
bedtools_complement:
- software/bedtools/complement/**
- tests/software/bedtools/complement/**
bedtools_genomecov:
- software/bedtools/genomecov/**
- tests/software/bedtools/genomecov/**
bedtools_intersect:
- software/bedtools/intersect/**
- tests/software/bedtools/intersect/**
bedtools_merge:
- software/bedtools/merge/**
- tests/software/bedtools/merge/**
bedtools_slop:
- software/bedtools/slop/**
- tests/software/bedtools/slop/**
bedtools_sort:
- software/bedtools/sort/**
- tests/software/bedtools/sort/**
bowtie:
- software/bowtie/build/**
- tests/software/bowtie/build/**
bowtie_align:
- software/bowtie/align/**
- software/bowtie/build/**
- tests/software/bowtie/align/**
bowtie:
- software/bowtie/build/**
- tests/software/bowtie/build/**
bowtie2:
- software/bowtie2/build/**
- tests/software/bowtie2/build/**
bowtie2_align:
- software/bowtie2/align/**
- software/bowtie2/build/**
- tests/software/bowtie2/align/**
bowtie2:
- software/bowtie2/build/**
- tests/software/bowtie2/build/**
bwa_index:
- software/bwa/index/**
- tests/software/bwa/index/**
@ -137,6 +161,10 @@ stringtie:
- software/stringtie/**
- tests/software/stringtie/**
tool_subtool:
- software/TOOL/SUBTOOL/**
- tests/software/TOOL/SUBTOOL/**
trimgalore:
- software/trimgalore/**
- tests/software/trimgalore/**
@ -144,27 +172,3 @@ trimgalore:
ucsc_bedgraphtobigwig:
- software/ucsc/bedgraphtobigwig/**
- tests/software/ucsc/bedgraphtobigwig/**
bedtools_complement:
- software/bedtools/complement/**
- tests/software/bedtools/complement/**
bedtools_genomecov:
- software/bedtools/genomecov/**
- tests/software/bedtools/genomecov/**
bedtools_intersect:
- software/bedtools/intersect/**
- tests/software/bedtools/intersect/**
bedtools_merge:
- software/bedtools/merge/**
- tests/software/bedtools/merge/**
bedtools_slop:
- software/bedtools/slop/**
- tests/software/bedtools/slop/**
bedtools_sort:
- software/bedtools/sort/**
- tests/software/bedtools/sort/**

116
README.md
View file

@ -12,6 +12,7 @@
[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
> THIS REPOSITORY IS UNDER ACTIVE DEVELOPMENT. SYNTAX, ORGANISATION AND LAYOUT MAY CHANGE WITHOUT NOTICE!
> PLEASE BE KIND TO OUR CODE REVIEWERS AND SUBMIT ONE PULL REQUEST PER MODULE :)
A repository for hosting [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) module files containing tool-specific process definitions and their associated documentation.
@ -21,7 +22,7 @@ A repository for hosting [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl
- [Adding a new module file](#adding-a-new-module-file)
- [Module template](#module-template)
- [Guidelines](#guidelines)
- [Testing](#testing)
- [CI tests](#ci-tests)
- [Documentation](#documentation)
- [Uploading to `nf-core/modules`](#uploading-to-nf-coremodules)
- [Terminology](#terminology)
@ -119,21 +120,21 @@ for examples.
### Module template
We have added a directory called [`software/SOFTWARE/TOOL/`](software/SOFTWARE/TOOL/) that serves as a template with which to create your own module submission. Where applicable, we have added extensive `TODO` statements to the files in this directory for general information, to help guide you as to where to make the appropriate changes, and how to make them. If in doubt, have a look at how we have done things for other modules.
We have added a directory called [`software/TOOL/SUBTOOL/`](software/TOOL/SUBTOOL/) that serves as a template with which to create your own module and [`tests/software/TOOL/SUBTOOL/`](tests/software/TOOL/SUBTOOL/) as an example of how to add the required CI tests. Where applicable, we have added extensive `TODO` statements for general information, to help guide you as to where to make the appropriate changes, and how to make them. If in doubt, have a look at how we have done things for other modules.
```console
.
├── software
│   └── SOFTWARE
│      └── TOOL
│      ├── functions.nf ## Utility functions imported in main module script
│      ├── main.nf ## Main module script
│      └── meta.yml ## Documentation for module, input, output, params, author
├── test
│   └── SOFTWARE
│      └── TOOL
│   ├── main.nf ## Minimal workflow to test module
│   └── test.yml ## Pytest-workflow test file
│   └── TOOL
│      └── SUBTOOL
│      ├── functions.nf ## Utility functions imported in main module script
│      ├── main.nf ## Main module script
│      └── meta.yml ## Documentation for module, input, output, params, author
├── tests
│   └── TOOL
│      └── SUBTOOL
│   ├── main.nf ## Minimal workflow to test module
│   └── test.yml ## Pytest-workflow test file
```
### Guidelines
@ -200,11 +201,28 @@ using a combination of `bwa` and `samtools` to output a BAM file instead of a SA
[BioContainers](https://biocontainers.pro/#/) is a registry of Docker and Singularity containers automatically created from all of the software packages on [Bioconda](https://bioconda.github.io/). Where possible we will use BioContainers to fetch pre-built software containers and Bioconda to install software using Conda.
- Software requirements SHOULD be declared within the module file using the Nextflow `container` directive e.g. go to the [BWA BioContainers webpage](https://biocontainers.pro/#/tools/bwa), click on the `Pacakages and Containers` tab, sort by `Version` and get the portion of the link after the `docker pull` command where `Type` is Docker. You may need to double-check that you are using the latest version of the software because you may find that containers for older versions have been rebuilt more recently.
- Software requirements SHOULD be declared within the module file using the Nextflow `container` directive. For single-tool BioContainers, the simplest method to obtain the Docker container path is to replace `bwa` with your tool name in this [Quay.io link](https://quay.io/repository/biocontainers/bwa?tab=tags). You will see a list of tags sorted by the most recent. You can then use exactly the same name (e.g. `bwa`) version (e.g. `0.7.17`) and tag (e.g. `hed695b0_7`) to add all of the Conda, Docker and Singularity definitions in the module.
- If the software is available on Conda it MUST also be defined using the Nextflow `conda` directive. Software MUST be pinned to the channel (i.e. `bioconda`) and version (i.e. `0.7.17`) e.g. `bioconda::bwa=0.7.17`. Pinning the build too is not currently a requirement e.g. `bioconda::bwa=0.7.17=h9402c20_2`.
```nextflow
conda (params.enable_conda ? "bioconda::bwa=0.7.17=hed695b0_7" : null) // Conda package
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/bwa:0.7.17--hed695b0_7" // Singularity image
} else {
container "quay.io/biocontainers/bwa:0.7.17--hed695b0_7" // Docker image
}
```
- If required, multi-tool containers may also be available on BioContainers e.g. [`bwa` and `samtools`](https://biocontainers.pro/#/tools/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40). It is also possible for a multi-tool container to be built and added to BioContainers by submitting a pull request on their [`multi-package-containers`](https://github.com/BioContainers/multi-package-containers) repository.
- If the software is available on Conda it MUST also be defined using the Nextflow `conda` directive. Using `bioconda::bwa=0.7.17=hed695b0_7` as an example, software MUST be pinned to the channel (i.e. `bioconda`), version (i.e. `0.7.17`) and build (i.e. `hed695b0_7`). This allows us to perform file output integrity CI tests on the same input test data with Docker, Singularity and Conda.
- If required, multi-tool containers may also be available on BioContainers e.g. [`bwa` and `samtools`](https://biocontainers.pro/#/tools/mulled-v2-fe8faa35dbf6dc65a0f7f5d4ea12e31a79f73e40). You can install and use the [`galaxy-tool-util`](https://anaconda.org/bioconda/galaxy-tool-util) package to search for both single- and multi-tool containers available in Conda, Docker and Singularity format. e.g. to search for Docker (hosted on Quay.io) and Singularity multi-tool containers with both `bowtie` and `samtools` installed you can use the following command:
```console
mulled-search --destination quay singularity --channel bioconda --search bowtie samtools | grep "mulled"
```
> NB: Build information for all tools within a multi-tool container can be obtained in the `/usr/local/conda-meta/history` file within the container.
- It is also possible for a new multi-tool container to be built and added to BioContainers by submitting a pull request on their [`multi-package-containers`](https://github.com/BioContainers/multi-package-containers) repository.
- If the software is not available on Bioconda a `Dockerfile` MUST be provided within the module directory. We will use GitHub Actions to auto-build the containers on the [GitHub Packages registry](https://github.com/features/packages).
@ -213,51 +231,75 @@ using a combination of `bwa` and `samtools` to output a BAM file instead of a SA
The [Nextflow `publishDir`](https://www.nextflow.io/docs/latest/process.html#publishdir) definition is currently quite limited in terms of parameter/option evaluation. To overcome this, the publishing logic we have implemented for use with DSL2 modules attempts to minimise changing the `publishDir` directive (default: `params.outdir`) in favour of constructing and appending the appropriate output directory paths via the `saveAs:` statement e.g.
```nextflow
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:meta.id) }
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:meta.id) }
```
The `saveFiles` function can be found in the [`functions.nf`](software/fastqc/functions.nf) file of utility functions that will be copied into all module directories. It uses the various publishing `options` specified as input to the module to construct and append the relevant output path to `params.outdir`.
We also use a standardised parameter called `params.publish_dir_mode` that can be used to alter the file publishing method (default: `copy`).
### Testing
### CI tests
In order to test that each module added to `nf-core/modules` is actually working and to be able to track any changes to results files between module updates we have set-up a number of Github Actions CI tests to run each module on a minimal test dataset using Docker, Singularity and Conda.
#### Test data
- All test data for `nf-core/modules` MUST be added to [`tests/data/`](tests/data/) and organised by filename extension.
- In order to keep the size of this repository as minimal as possible, pre-existing files from [`tests/data/`](tests/data/) MUST be reused if at all possible.
- Test files MUST be kept as tiny as possible.
- Every module MUST be tested by adding a test workflow with a toy dataset in the [`tests/`](tests/software/fastqc/main.nf) directory of the module.
#### Pytest workflow
- Generic files from [`tests/data/`](tests/data/) MUST be reused by importing them as `file(${launchDir}/tests/data/fastq/rna/test_single_end.fastq.gz)`
- Every module MUST have a test workflow utilising test data added to the appropriate directory e.g. [`tests/software/fastqc/main.nf`](tests/software/fastqc/main.nf)
- Any outputs produced by the test workflow MUST be included in the [pytest-workflow](https://pytest-workflow.readthedocs.io/en/stable) for that tool. md5sum is preferred, however it's acceptable to not have it on files that the hash changes due to various headers and timestamps (html). Please do your best to avoid just checking for the file being present.
- Any outputs produced by the test workflow MUST be included in the [pytest-workflow](https://pytest-workflow.readthedocs.io/en/stable) for that tool e.g. [`tests/software/fastqc/test.yml`](tests/software/fastqc/test.yml). `md5sum` checks are the preferable choice of test to determine file changes, however, this may not be possible for all outputs generated by some tools e.g. if they include time stamps or command-related headers. Please do your best to avoid just checking for the file being present e.g. it may still be possible to check that the file contains the appropriate text snippets.
- If the appropriate test data doesn't exist for your module then it MUST be added to [`tests/data/`](tests/data/).
- A filter for the module must be created in [`.github/filters.yml`](.github/filters.yml). Please include any paths specific for that tool or upstream of that tool (For example bowtie build is upstream of bowtie align).
- A filter for the module must be created in [`.github/filters.yml`](.github/filters.yml). If the test workflow you have created invokes more than one tool please include any paths specific for those tool's too e.g. `bowtie build` is upstream of `bowtie align` and they have both been chained together to test the latter.
#### Running Tests Locally
0. Have either `docker`, `singularity` or `conda` installed
1. See [pytest-workflow installation](https://pytest-workflow.readthedocs.io/en/stable/#installation) for directions to install
2. Run
1. Install [`nextflow`](https://nf-co.re/usage/installation)
``` bash
PROFILE=docker pytest --tag new_module --symlink --wt 2 --kwdof
PROFILE=conda pytest --tag new_module --symlink --wt 2 --kwdof
TMPDIR=~ PROFILE=singularity pytest --tag new_module --symlink --wt 2 --kwdof
alias nf-test="PROFILE=docker pytest --tag new_module --symlink --wt 2 --kwdof"
```
2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`Conda`](https://conda.io/miniconda.html)
3. Install [`pytest-workflow`](https://pytest-workflow.readthedocs.io/en/stable/#installation)
4. Start running your own tests!
- Typical command with Docker:
```console
cd /path/to/git/clone/of/nf-core/modules/
PROFILE=docker pytest --tag tests/software/bowtie/build --symlink --wt 2 --keep-workflow-wd
```
- Typical command with Singularity:
```console
cd /path/to/git/clone/of/nf-core/modules/
TMPDIR=~ PROFILE=singularity pytest --tag tests/software/bowtie/build --symlink --wt 2 --keep-workflow-wd
```
- Typical command with Conda:
```console
cd /path/to/git/clone/of/nf-core/modules/
PROFILE=conda pytest --tag tests/software/bowtie/build --symlink --wt 2 --keep-workflow-wd
```
### Documentation
- A module MUST be documented in the [`meta.yml`](software/fastqc/meta.yml) file. It MUST document `params`, `input` and `output`. `input` and `output` MUST be a nested list.
- A module MUST be documented in the [`meta.yml`](software/TOOL/SUBTOOL/meta.yml) file. It MUST document `params`, `input` and `output`. `input` and `output` MUST be a nested list.
We are aware that there is very little documentation, documenting the (`Documentation`)[#documentation] section. Writing more code and tests is so much cooooler! Please bear with us, we will get here eventually...
### Uploading to `nf-core/modules`
[Fork](https://help.github.com/articles/fork-a-repo/) the `nf-core/modules` repository to your own GitHub account. Within the local clone of your fork add the module file to the [`software/`](software) directory.
[Fork](https://help.github.com/articles/fork-a-repo/) the `nf-core/modules` repository to your own GitHub account. Within the local clone of your fork add the module file to the [`software/`](software) directory. Please try and keep PRs as atomic as possible to aid the reviewing process - ideally, one module addition/update per PR.
Commit and push these changes to your local clone on GitHub, and then [create a pull request](https://help.github.com/articles/creating-a-pull-request-from-a-fork/) on the `nf-core/modules` GitHub repo with the appropriate information.
@ -286,8 +328,6 @@ If you use the module files in this repository for your analysis please you can
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ)
<!---

View file

@ -22,8 +22,8 @@ params.options = [:]
def options = initOptions(params.options)
// TODO nf-core: Process name MUST be all uppercase,
// "SOFTWARE" and (ideally) "TOOL" MUST be all one word separated by an "_".
process SOFTWARE_TOOL {
// "TOOL" and (ideally) "SUBTOOL" MUST be all one word separated by an "_".
process TOOL_SUBTOOL {
// TODO nf-core: If a meta map of sample information is NOT provided in "input:" section
// change tag value to another appropriate input value e.g. tag "$fasta"
tag "$meta.id"
@ -40,10 +40,7 @@ process SOFTWARE_TOOL {
// Software MUST be pinned to channel (i.e. "bioconda"), version (i.e. "1.10") and build (i.e. "h9402c20_2") as in the example below.
conda (params.enable_conda ? "bioconda::samtools=1.10=h9402c20_2" : null)
// TODO nf-core: Fetch "docker pull" address for latest BioContainer image of software: e.g. https://biocontainers.pro/#/tools/samtools
// Click on the Pacakages and Containers tab, sort by Version and get the portion of the link after the docker pull command where Type is Docker.
// You may need to double-check that you are using the latest version of the software because you may find that containers for older versions have been rebuilt more recently.
// If required, multi-tool containers may also be available and are usually named to start with "mulled".
// TODO nf-core: See section in main README for further information regarding finding and adding container addresses to the section below.
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/samtools:1.10--h9402c20_2"
} else {
@ -57,7 +54,7 @@ process SOFTWARE_TOOL {
// https://github.com/nf-core/modules/blob/master/software/bwa/index/main.nf
// TODO nf-core: Where applicable please provide/convert compressed files as input/output
// e.g. "*.fastq.gz" and NOT "*.fastq", "*.bam" and NOT "*.sam" etc.
tuple val(meta), path(reads)
tuple val(meta), path(bam)
output:
// TODO nf-core: Named file extensions MUST be emitted for ALL output channels
@ -66,6 +63,7 @@ process SOFTWARE_TOOL {
// TODO nf-core: List additional required output channels/values here
path "*.version.txt" , emit: version
script:
def software = getSoftwareName(task.process)
// TODO nf-core: If a meta map of sample information is NOT provided in "input:" section delete the line below
@ -78,11 +76,13 @@ process SOFTWARE_TOOL {
// using the Nextflow "task" variable e.g. "--threads $task.cpus"
// TODO nf-core: Please indent the command appropriately (4 spaces!!) to help with readability ;)
"""
software tool \\
samtools \\
sort \\
$options.args \\
--threads $task.cpus \\
$reads \\
> ${prefix}.bam
-@ $task.cpus \\
-o ${prefix}.bam \\
-T $prefix \\
$bam
echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//' > ${software}.version.txt
"""

View file

@ -1,25 +1,24 @@
## TODO nf-core: Please delete all of these TODO statements once the file has been curated
## TODO nf-core: Change the name of "software_tool" below
name: software_tool
## TODO nf-core: Change the name of "tool_subtool" below
name: tool_subtool
## TODO nf-core: Add a description and keywords
description: Run FastQC on sequenced reads
description: Sort SAM/BAM/CRAM file
keywords:
- quality control
- qc
- adapters
- fastq
- sort
- bam
- sam
- cram
tools:
## TODO nf-core: Change the name of "software" below
- software:
## TODO nf-core: Change the name of the tool below
- samtools:
## TODO nf-core: Add a description and other details for the software below
description: |
FastQC gives general quality metrics about your reads.
It provides information about the quality score distribution
across your reads, the per base sequence content (%A/C/G/T).
You get information about adapter contamination and other
overrepresented sequences.
homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
SAMtools is a set of utilities for interacting with and post-processing
short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li.
These files are generated as output by short read aligners like BWA.
homepage: http://www.htslib.org/
documentation: hhttp://www.htslib.org/doc/samtools.html
doi: 10.1093/bioinformatics/btp352
## TODO nf-core: If you are using any additional "params" in the main.nf script of the module add them below
params:
- outdir:
@ -49,11 +48,10 @@ input:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- reads:
- bam:
type: file
description: |
List of input FastQ files of size 1 and 2 for single-end and paired-end data,
respectively.
description: BAM/CRAM/SAM file
pattern: "*.{bam,cram,sam}"
## TODO nf-core: Add a description of all of the variables used as output
output:
- meta:
@ -61,14 +59,10 @@ output:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- html:
- bam:
type: file
description: FastQC report
pattern: "*_{fastqc.html}"
- zip:
type: file
description: FastQC report archive
pattern: "*_{fastqc.zip}"
description: Sorted BAM/CRAM/SAM file
pattern: "*.{bam,cram,sam}"
- version:
type: file
description: File containing software version

View file

@ -0,0 +1,13 @@
#!/usr/bin/env nextflow
nextflow.enable.dsl = 2
include { TOOL_SUBTOOL } from '../../../../software/TOOL/SUBTOOL/main.nf' addParams( options: [:] )
workflow test_tool_subtool {
def input = []
input = [ [ id:'test', single_end:false ], // meta map
file("${launchDir}/tests/data/bam/test.paired_end.sorted.bam", checkIfExists: true) ]
TOOL_SUBTOOL ( input )
}

View file

@ -0,0 +1,8 @@
- name: tool subtool
command: nextflow run ./tests/software/TOOL/SUBTOOL -entry test_tool_subtool -c tests/config/nextflow.config
tags:
- tool
- tool_subtool
files:
- path: output/tool/test.bam
md5sum: a41bfadacd2eeef1d31e05c135cc4f4e