Merge branch 'nf-core:master' into master

This commit is contained in:
Svyatoslav Sidorov 2021-07-21 10:37:14 +03:00 committed by GitHub
commit 06a8ac9b72
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
1069 changed files with 2378 additions and 1790 deletions

View file

@ -16,7 +16,7 @@ jobs:
- uses: dorny/paths-filter@v2 - uses: dorny/paths-filter@v2
id: filter id: filter
with: with:
filters: "tests/config/pytest_software.yml" filters: "tests/config/pytest_modules.yml"
lint: lint:
runs-on: ubuntu-20.04 runs-on: ubuntu-20.04

View file

@ -14,7 +14,7 @@ jobs:
- uses: dorny/paths-filter@v2 - uses: dorny/paths-filter@v2
id: filter id: filter
with: with:
filters: 'tests/config/pytest_software.yml' filters: "tests/config/pytest_modules.yml"
test: test:
runs-on: ubuntu-20.04 runs-on: ubuntu-20.04
@ -25,9 +25,9 @@ jobs:
strategy: strategy:
fail-fast: false fail-fast: false
matrix: matrix:
nxf_version: ['21.04.0'] nxf_version: ["21.04.0"]
tags: ['${{ fromJson(needs.changes.outputs.modules) }}'] tags: ["${{ fromJson(needs.changes.outputs.modules) }}"]
profile: ['docker', 'singularity', 'conda'] profile: ["docker", "singularity", "conda"]
env: env:
NXF_ANSI_LOG: false NXF_ANSI_LOG: false
steps: steps:

View file

@ -34,9 +34,9 @@ A repository for hosting [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl
The module files hosted in this repository define a set of processes for software tools such as `fastqc`, `bwa`, `samtools` etc. This allows you to share and add common functionality across multiple pipelines in a modular fashion. The module files hosted in this repository define a set of processes for software tools such as `fastqc`, `bwa`, `samtools` etc. This allows you to share and add common functionality across multiple pipelines in a modular fashion.
We have written a helper command in the `nf-core/tools` package that uses the GitHub API to obtain the relevant information for the module files present in the [`software/`](software/) directory of this repository. This includes using `git` commit hashes to track changes for reproducibility purposes, and to download and install all of the relevant module files. We have written a helper command in the `nf-core/tools` package that uses the GitHub API to obtain the relevant information for the module files present in the [`modules/`](modules/) directory of this repository. This includes using `git` commit hashes to track changes for reproducibility purposes, and to download and install all of the relevant module files.
1. Install the latest version of [`nf-core/tools`](https://github.com/nf-core/tools#installation) (`>=1.13`) 1. Install the latest version of [`nf-core/tools`](https://github.com/nf-core/tools#installation) (`>=2.0`)
2. List the available modules: 2. List the available modules:
```console ```console
@ -48,7 +48,7 @@ We have written a helper command in the `nf-core/tools` package that uses the Gi
| \| | \__, \__/ | \ |___ \`-._,-`-, | \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,' `._,._,'
nf-core/tools version 1.13 nf-core/tools version 2.0
INFO Modules available from nf-core/modules (master): pipeline_modules.py:164 INFO Modules available from nf-core/modules (master): pipeline_modules.py:164
@ -73,10 +73,10 @@ We have written a helper command in the `nf-core/tools` package that uses the Gi
| \| | \__, \__/ | \ |___ \`-._,-`-, | \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,' `._,._,'
nf-core/tools version 1.13 nf-core/tools version 2.0
INFO Installing fastqc pipeline_modules.py:213 INFO Installing fastqc pipeline_modules.py:213
INFO Downloaded 3 files to ./modules/nf-core/software/fastqc pipeline_modules.py:236 INFO Downloaded 3 files to ./modules/nf-core/modules/fastqc pipeline_modules.py:236
``` ```
4. Import the module in your Nextflow script: 4. Import the module in your Nextflow script:
@ -86,7 +86,7 @@ We have written a helper command in the `nf-core/tools` package that uses the Gi
nextflow.enable.dsl = 2 nextflow.enable.dsl = 2
include { FASTQC } from './modules/nf-core/software/fastqc/main' addParams( options: [:] ) include { FASTQC } from './modules/nf-core/modules/fastqc/main' addParams( options: [:] )
``` ```
5. Remove the module from the pipeline repository if required: 5. Remove the module from the pipeline repository if required:
@ -100,7 +100,7 @@ We have written a helper command in the `nf-core/tools` package that uses the Gi
| \| | \__, \__/ | \ |___ \`-._,-`-, | \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,' `._,._,'
nf-core/tools version 1.13 nf-core/tools version 2.0
INFO Removing fastqc pipeline_modules.py:271 INFO Removing fastqc pipeline_modules.py:271
INFO Successfully removed fastqc pipeline_modules.py:285 INFO Successfully removed fastqc pipeline_modules.py:285
@ -117,7 +117,7 @@ We have written a helper command in the `nf-core/tools` package that uses the Gi
| \| | \__, \__/ | \ |___ \`-._,-`-, | \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,' `._,._,'
nf-core/tools version 1.13 nf-core/tools version 2.0
INFO Linting pipeline: . lint.py:104 INFO Linting pipeline: . lint.py:104
INFO Linting module: fastqc lint.py:106 INFO Linting module: fastqc lint.py:106
@ -128,7 +128,7 @@ We have written a helper command in the `nf-core/tools` package that uses the Gi
╭──────────────┬───────────────────────────────┬──────────────────────────────────╮ ╭──────────────┬───────────────────────────────┬──────────────────────────────────╮
│ Module name │ Test message │ File path │ │ Module name │ Test message │ File path │
├──────────────┼───────────────────────────────┼──────────────────────────────────┤ ├──────────────┼───────────────────────────────┼──────────────────────────────────┤
│ fastqc │ Local copy of module outdated │ modules/nf-core/software/fastqc/ │ fastqc │ Local copy of module outdated │ modules/nf-core/modules/fastqc/
╰──────────────┴────────────────────────────── ┴──────────────────────────────────╯ ╰──────────────┴────────────────────────────── ┴──────────────────────────────────╯
╭──────────────────────╮ ╭──────────────────────╮
│ LINT RESULTS SUMMARY │ │ LINT RESULTS SUMMARY │
@ -146,12 +146,12 @@ We have plans to add other utility commands to help developers install and maint
If you decide to upload a module to `nf-core/modules` then this will If you decide to upload a module to `nf-core/modules` then this will
ensure that it will become available to all nf-core pipelines, ensure that it will become available to all nf-core pipelines,
and to everyone within the Nextflow community! See and to everyone within the Nextflow community! See
[`software/`](software) [`modules/`](modules)
for examples. for examples.
### Checklist ### Checklist
Please check that the module you wish to add isn't already on [`nf-core/modules`](https://github.com/nf-core/modules/tree/master/software): Please check that the module you wish to add isn't already on [`nf-core/modules`](https://github.com/nf-core/modules/tree/master/modules):
- Use the [`nf-core modules list`](https://github.com/nf-core/tools#list-modules) command - Use the [`nf-core modules list`](https://github.com/nf-core/tools#list-modules) command
- Check [open pull requests](https://github.com/nf-core/modules/pulls) - Check [open pull requests](https://github.com/nf-core/modules/pulls)
- Search [open issues](https://github.com/nf-core/modules/issues) - Search [open issues](https://github.com/nf-core/modules/issues)
@ -165,7 +165,7 @@ If the module doesn't exist on `nf-core/modules`:
We have implemented a number of commands in the `nf-core/tools` package to make it incredibly easy for you to create and contribute your own modules to nf-core/modules. We have implemented a number of commands in the `nf-core/tools` package to make it incredibly easy for you to create and contribute your own modules to nf-core/modules.
1. Install the latest version of [`nf-core/tools`](https://github.com/nf-core/tools#installation) (`>=1.13`) 1. Install the latest version of [`nf-core/tools`](https://github.com/nf-core/tools#installation) (`>=2.0`)
2. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.04.0`) 2. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.04.0`)
3. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`Conda`](https://conda.io/miniconda.html) 3. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`Conda`](https://conda.io/miniconda.html)
4. [Fork and clone this repo locally](#uploading-to-nf-coremodules) 4. [Fork and clone this repo locally](#uploading-to-nf-coremodules)
@ -181,7 +181,7 @@ We have implemented a number of commands in the `nf-core/tools` package to make
git checkout -b fastqc git checkout -b fastqc
``` ```
6. Create a module using the [nf-core DSL2 module template](https://github.com/nf-core/tools/blob/master/nf_core/module-template/software/main.nf): 6. Create a module using the [nf-core DSL2 module template](https://github.com/nf-core/tools/blob/master/nf_core/module-template/modules/main.nf):
```console ```console
$ nf-core modules create . --tool fastqc --author @joebloggs --label process_low --meta $ nf-core modules create . --tool fastqc --author @joebloggs --label process_low --meta
@ -192,36 +192,36 @@ We have implemented a number of commands in the `nf-core/tools` package to make
| \| | \__, \__/ | \ |___ \`-._,-`-, | \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,' `._,._,'
nf-core/tools version 1.13 nf-core/tools version 2.0
INFO Using Bioconda package: 'bioconda::fastqc=0.11.9' create.py:130 INFO Using Bioconda package: 'bioconda::fastqc=0.11.9' create.py:130
INFO Using Docker / Singularity container with tag: 'fastqc:0.11.9--0' create.py:140 INFO Using Docker / Singularity container with tag: 'fastqc:0.11.9--0' create.py:140
INFO Created / edited following files: create.py:218 INFO Created / edited following files: create.py:218
./software/fastqc/functions.nf ./modules/fastqc/functions.nf
./software/fastqc/main.nf ./modules/fastqc/main.nf
./software/fastqc/meta.yml ./modules/fastqc/meta.yml
./tests/software/fastqc/main.nf ./tests/modules/fastqc/main.nf
./tests/software/fastqc/test.yml ./tests/modules/fastqc/test.yml
./tests/config/pytest_software.yml ./tests/config/pytest_modules.yml
``` ```
All of the files required to add the module to `nf-core/modules` will be created/edited in the appropriate places. The 4 files you will need to change are: All of the files required to add the module to `nf-core/modules` will be created/edited in the appropriate places. The 4 files you will need to change are:
1. [`./software/fastqc/main.nf`](https://github.com/nf-core/modules/blob/master/software/fastqc/main.nf) 1. [`./modules/fastqc/main.nf`](https://github.com/nf-core/modules/blob/master/modules/fastqc/main.nf)
This is the main script containing the `process` definition for the module. You will see an extensive number of `TODO` statements to help guide you to fill in the appropriate sections and to ensure that you adhere to the guidelines we have set for module submissions. This is the main script containing the `process` definition for the module. You will see an extensive number of `TODO` statements to help guide you to fill in the appropriate sections and to ensure that you adhere to the guidelines we have set for module submissions.
2. [`./software/fastqc/meta.yml`](https://github.com/nf-core/modules/blob/master/software/fastqc/meta.yml) 2. [`./modules/fastqc/meta.yml`](https://github.com/nf-core/modules/blob/master/modules/fastqc/meta.yml)
This file will be used to store general information about the module and author details - the majority of which will already be auto-filled. However, you will need to add a brief description of the files defined in the `input` and `output` section of the main script since these will be unique to each module. This file will be used to store general information about the module and author details - the majority of which will already be auto-filled. However, you will need to add a brief description of the files defined in the `input` and `output` section of the main script since these will be unique to each module.
3. [`./tests/software/fastqc/main.nf`](https://github.com/nf-core/modules/blob/master/tests/software/fastqc/main.nf) 3. [`./tests/modules/fastqc/main.nf`](https://github.com/nf-core/modules/blob/master/tests/modules/fastqc/main.nf)
Every module MUST have a test workflow. This file will define one or more Nextflow `workflow` definitions that will be used to unit test the output files created by the module. By default, one `workflow` definition will be added but please feel free to add as many as possible so we can ensure that the module works on different data types / parameters e.g. separate `workflow` for single-end and paired-end data. Every module MUST have a test workflow. This file will define one or more Nextflow `workflow` definitions that will be used to unit test the output files created by the module. By default, one `workflow` definition will be added but please feel free to add as many as possible so we can ensure that the module works on different data types / parameters e.g. separate `workflow` for single-end and paired-end data.
Minimal test data required for your module may already exist within this repository, in which case you may just have to change a couple of paths in this file - see the [Test data](#test-data) section for more info and guidelines for adding new standardised data if required. Minimal test data required for your module may already exist within this repository, in which case you may just have to change a couple of paths in this file - see the [Test data](#test-data) section for more info and guidelines for adding new standardised data if required.
4. [`./tests/software/fastqc/test.yml`](https://github.com/nf-core/modules/blob/master/tests/software/fastqc/test.yml) 4. [`./tests/modules/fastqc/test.yml`](https://github.com/nf-core/modules/blob/master/tests/modules/fastqc/test.yml)
This file will contain all of the details required to unit test the main script in the point above using [pytest-workflow](https://pytest-workflow.readthedocs.io/). If possible, any outputs produced by the test workflow(s) MUST be included and listed in this file along with an appropriate check e.g. md5sum. The different test options are listed in the [pytest-workflow docs](https://pytest-workflow.readthedocs.io/en/stable/#test-options). This file will contain all of the details required to unit test the main script in the point above using [pytest-workflow](https://pytest-workflow.readthedocs.io/). If possible, any outputs produced by the test workflow(s) MUST be included and listed in this file along with an appropriate check e.g. md5sum. The different test options are listed in the [pytest-workflow docs](https://pytest-workflow.readthedocs.io/en/stable/#test-options).
@ -240,24 +240,24 @@ We have implemented a number of commands in the `nf-core/tools` package to make
| \| | \__, \__/ | \ |___ \`-._,-`-, | \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,' `._,._,'
nf-core/tools version 1.13 nf-core/tools version 2.0
INFO Press enter to use default values (shown in brackets) or type your own responses test_yml_builder.py:51 INFO Press enter to use default values (shown in brackets) or type your own responses test_yml_builder.py:51
? Tool name: fastqc ? Tool name: fastqc
Test YAML output path (- for stdout) (tests/software/fastqc/test.yml): Test YAML output path (- for stdout) (tests/modules/fastqc/test.yml):
INFO Looking for test workflow entry points: 'tests/software/fastqc/main.nf' test_yml_builder.py:116 INFO Looking for test workflow entry points: 'tests/modules/fastqc/main.nf' test_yml_builder.py:116
INFO Building test meta for entry point 'test_fastqc_single_end' test_yml_builder.py:150 INFO Building test meta for entry point 'test_fastqc_single_end' test_yml_builder.py:150
Test name (fastqc test_fastqc_single_end): Test name (fastqc test_fastqc_single_end):
Test command (nextflow run tests/software/fastqc -entry test_fastqc_single_end -c tests/config/nextflow.config): Test command (nextflow run tests/modules/fastqc -entry test_fastqc_single_end -c tests/config/nextflow.config):
Test tags (comma separated) (fastqc,fastqc_single_end): Test tags (comma separated) (fastqc,fastqc_single_end):
Test output folder with results (leave blank to run test): Test output folder with results (leave blank to run test):
? Choose software profile Singularity ? Choose software profile Singularity
INFO Setting env var '$PROFILE' to 'singularity' test_yml_builder.py:258 INFO Setting env var '$PROFILE' to 'singularity' test_yml_builder.py:258
INFO Running 'fastqc' test with command: test_yml_builder.py:263 INFO Running 'fastqc' test with command: test_yml_builder.py:263
nextflow run tests/software/fastqc -entry test_fastqc_single_end -c tests/config/nextflow.config --outdir /tmp/tmpgbneftf5 nextflow run tests/modules/fastqc -entry test_fastqc_single_end -c tests/config/nextflow.config --outdir /tmp/tmpgbneftf5
INFO Test workflow finished! test_yml_builder.py:276 INFO Test workflow finished! test_yml_builder.py:276
INFO Writing to 'tests/software/fastqc/test.yml' test_yml_builder.py:293 INFO Writing to 'tests/modules/fastqc/test.yml' test_yml_builder.py:293
``` ```
> NB: See docs for [running tests manually](#running-tests-manually) if you would like to run the tests manually. > NB: See docs for [running tests manually](#running-tests-manually) if you would like to run the tests manually.
@ -273,7 +273,7 @@ We have implemented a number of commands in the `nf-core/tools` package to make
| \| | \__, \__/ | \ |___ \`-._,-`-, | \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,' `._,._,'
nf-core/tools version 1.13 nf-core/tools version 2.0
INFO Linting modules repo: . lint.py:102 INFO Linting modules repo: . lint.py:102
INFO Linting module: fastqc lint.py:106 INFO Linting module: fastqc lint.py:106
@ -284,9 +284,9 @@ We have implemented a number of commands in the `nf-core/tools` package to make
╭──────────────┬──────────────────────────────────────────────────────────────┬──────────────────────────────────╮ ╭──────────────┬──────────────────────────────────────────────────────────────┬──────────────────────────────────╮
│ Module name │ Test message │ File path │ │ Module name │ Test message │ File path │
├──────────────┼──────────────────────────────────────────────────────────────┼──────────────────────────────────┤ ├──────────────┼──────────────────────────────────────────────────────────────┼──────────────────────────────────┤
│ fastqc │ TODO string in meta.yml: #Add a description of the module... │ modules/nf-core/software/fastqc/ │ fastqc │ TODO string in meta.yml: #Add a description of the module... │ modules/nf-core/modules/fastqc/
│ fastqc │ TODO string in meta.yml: #Add a description and other det... │ modules/nf-core/software/fastqc/ │ fastqc │ TODO string in meta.yml: #Add a description and other det... │ modules/nf-core/modules/fastqc/
│ fastqc │ TODO string in meta.yml: #Add a description of all of the... │ modules/nf-core/software/fastqc/ │ fastqc │ TODO string in meta.yml: #Add a description of all of the... │ modules/nf-core/modules/fastqc/
╰──────────────┴──────────────────────────────────────────────────────────────┴──────────────────────────────────╯ ╰──────────────┴──────────────────────────────────────────────────────────────┴──────────────────────────────────╯
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ ╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ [!] 1 Test Failed │ │ [!] 1 Test Failed │
@ -294,7 +294,7 @@ We have implemented a number of commands in the `nf-core/tools` package to make
╭──────────────┬──────────────────────────────────────────────────────────────┬──────────────────────────────────╮ ╭──────────────┬──────────────────────────────────────────────────────────────┬──────────────────────────────────╮
│ Module name │ Test message │ File path │ │ Module name │ Test message │ File path │
├──────────────┼──────────────────────────────────────────────────────────────┼──────────────────────────────────┤ ├──────────────┼──────────────────────────────────────────────────────────────┼──────────────────────────────────┤
│ fastqc │ 'meta' map not emitted in output channel(s) │ modules/nf-core/software/fastqc/ │ fastqc │ 'meta' map not emitted in output channel(s) │ modules/nf-core/modules/fastqc/
╰──────────────┴──────────────────────────────────────────────────────────────┴──────────────────────────────────╯ ╰──────────────┴──────────────────────────────────────────────────────────────┴──────────────────────────────────╯
╭──────────────────────╮ ╭──────────────────────╮
│ LINT RESULTS SUMMARY │ │ LINT RESULTS SUMMARY │
@ -356,7 +356,7 @@ Please follow the steps below to run the tests locally:
3. Install [`pytest-workflow`](https://pytest-workflow.readthedocs.io/en/stable/#installation) 3. Install [`pytest-workflow`](https://pytest-workflow.readthedocs.io/en/stable/#installation)
4. Start running your own tests using the appropriate [`tag`](https://github.com/nf-core/modules/blob/3d720a24fd3c766ba56edf3d4e108a1c45d353b2/tests/software/fastqc/test.yml#L3-L5) defined in the `test.yml`: 4. Start running your own tests using the appropriate [`tag`](https://github.com/nf-core/modules/blob/3d720a24fd3c766ba56edf3d4e108a1c45d353b2/tests/modules/fastqc/test.yml#L3-L5) defined in the `test.yml`:
- Typical command with Docker: - Typical command with Docker:
@ -383,7 +383,7 @@ Please follow the steps below to run the tests locally:
### Uploading to `nf-core/modules` ### Uploading to `nf-core/modules`
[Fork](https://help.github.com/articles/fork-a-repo/) the `nf-core/modules` repository to your own GitHub account. Within the local clone of your fork add the module file to the [`software/`](software) directory. Please try and keep PRs as atomic as possible to aid the reviewing process - ideally, one module addition/update per PR. [Fork](https://help.github.com/articles/fork-a-repo/) the `nf-core/modules` repository to your own GitHub account. Within the local clone of your fork add the module file to the [`modules/`](modules) directory. Please try and keep PRs as atomic as possible to aid the reviewing process - ideally, one module addition/update per PR.
Commit and push these changes to your local clone on GitHub, and then [create a pull request](https://help.github.com/articles/creating-a-pull-request-from-a-fork/) on the `nf-core/modules` GitHub repo with the appropriate information. Commit and push these changes to your local clone on GitHub, and then [create a pull request](https://help.github.com/articles/creating-a-pull-request-from-a-fork/) on the `nf-core/modules` GitHub repo with the appropriate information.
@ -395,6 +395,8 @@ The key words "MUST", "MUST NOT", "SHOULD", etc. are to be interpreted as descri
#### General #### General
- All non-mandatory command-line tool options MUST be provided as a string i.e. `options.args` where `options` is a Groovy Map that MUST be provided via the Nextflow `addParams` option when including the module via `include` in the parent workflow.
- Software that can be piped together SHOULD be added to separate module files - Software that can be piped together SHOULD be added to separate module files
unless there is a run-time, storage advantage in implementing in this way. For example, unless there is a run-time, storage advantage in implementing in this way. For example,
using a combination of `bwa` and `samtools` to output a BAM file instead of a SAM file: using a combination of `bwa` and `samtools` to output a BAM file instead of a SAM file:
@ -413,13 +415,13 @@ using a combination of `bwa` and `samtools` to output a BAM file instead of a SA
echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//' > ${software}.version.txt echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//' > ${software}.version.txt
``` ```
If the software is unable to output a version number on the command-line then a variable called `VERSION` can be manually specified to create this file e.g. [homer/annotatepeaks module](https://github.com/nf-core/modules/blob/master/software/homer/annotatepeaks/main.nf). If the software is unable to output a version number on the command-line then a variable called `VERSION` can be manually specified to create this file e.g. [homer/annotatepeaks module](https://github.com/nf-core/modules/blob/master/modules/homer/annotatepeaks/main.nf).
- The process definition MUST NOT contain a `when` statement. - The process definition MUST NOT contain a `when` statement.
#### Naming conventions #### Naming conventions
- The directory structure for the module name must be all lowercase e.g. [`software/bwa/mem/`](software/bwa/mem/). The name of the software (i.e. `bwa`) and tool (i.e. `mem`) MUST be all one word. - The directory structure for the module name must be all lowercase e.g. [`modules/bwa/mem/`](modules/bwa/mem/). The name of the software (i.e. `bwa`) and tool (i.e. `mem`) MUST be all one word.
- The process name in the module file MUST be all uppercase e.g. `process BWA_MEM {`. The name of the software (i.e. `BWA`) and tool (i.e. `MEM`) MUST be all one word separated by an underscore. - The process name in the module file MUST be all uppercase e.g. `process BWA_MEM {`. The name of the software (i.e. `BWA`) and tool (i.e. `MEM`) MUST be all one word separated by an underscore.
@ -431,7 +433,7 @@ using a combination of `bwa` and `samtools` to output a BAM file instead of a SA
- A module file SHOULD only define input and output files as command-line parameters to be executed within the process. - A module file SHOULD only define input and output files as command-line parameters to be executed within the process.
- All other parameters MUST be provided as a string i.e. `options.args` where `options` is a Groovy Map that MUST be provided via the Nextflow `addParams` option when including the module via `include` in the parent workflow. - All `params` within the module MUST be initialised and used in the local context of the module. In other words, named `params` defined in the parent workflow MUST NOT be assumed to be passed to the module to allow developers to call their parameters whatever they want. In general, it may be more suitable to use additional `input` value channels to cater for such scenarios.
- If the tool supports multi-threading then you MUST provide the appropriate parameter using the Nextflow `task` variable e.g. `--threads $task.cpus`. - If the tool supports multi-threading then you MUST provide the appropriate parameter using the Nextflow `task` variable e.g. `--threads $task.cpus`.
@ -514,7 +516,7 @@ publishDir "${params.outdir}",
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:meta.id) } saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:meta.id) }
``` ```
The `saveFiles` function can be found in the [`functions.nf`](software/fastqc/functions.nf) file of utility functions that will be copied into all module directories. It uses the various publishing `options` specified as input to the module to construct and append the relevant output path to `params.outdir`. The `saveFiles` function can be found in the [`functions.nf`](modules/fastqc/functions.nf) file of utility functions that will be copied into all module directories. It uses the various publishing `options` specified as input to the module to construct and append the relevant output path to `params.outdir`.
We also use a standardised parameter called `params.publish_dir_mode` that can be used to alter the file publishing method (default: `copy`). We also use a standardised parameter called `params.publish_dir_mode` that can be used to alter the file publishing method (default: `copy`).
@ -522,7 +524,7 @@ We also use a standardised parameter called `params.publish_dir_mode` that can b
The features offered by Nextflow DSL2 can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology we have decided to use when referring to DSL2 components: The features offered by Nextflow DSL2 can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology we have decided to use when referring to DSL2 components:
- *Module*: A `process` that can be used within different pipelines and is as atomic as possible i.e. cannot be split into another module. An example of this would be a module file containing the process definition for a single tool such as `FastQC`. At present, this repository has been created to only host atomic module files that should be added to the [`software/`](software/) directory along with the required documentation and tests. - *Module*: A `process` that can be used within different pipelines and is as atomic as possible i.e. cannot be split into another module. An example of this would be a module file containing the process definition for a single tool such as `FastQC`. At present, this repository has been created to only host atomic module files that should be added to the [`modules/`](modules/) directory along with the required documentation and tests.
- *Sub-workflow*: A chain of multiple modules that offer a higher-level of functionality within the context of a pipeline. For example, a sub-workflow to run multiple QC tools with FastQ files as input. Sub-workflows should be shipped with the pipeline implementation and if required they should be shared amongst different pipelines directly from there. As it stands, this repository will not host sub-workflows although this may change in the future since well-written sub-workflows will be the most powerful aspect of DSL2. - *Sub-workflow*: A chain of multiple modules that offer a higher-level of functionality within the context of a pipeline. For example, a sub-workflow to run multiple QC tools with FastQ files as input. Sub-workflows should be shipped with the pipeline implementation and if required they should be shared amongst different pipelines directly from there. As it stands, this repository will not host sub-workflows although this may change in the future since well-written sub-workflows will be the most powerful aspect of DSL2.

View file

@ -1,38 +0,0 @@
nextflow.preview.dsl=2
process FASTQ_SCREEN {
publishDir "$outputdir",
mode: "link", overwrite: true
// depending on the number of genomes and the type of genome (e.g. plants!), memory needs to be ample!
// label 'bigMem'
// label 'multiCore'
input:
tuple val(name), path(reads)
val outputdir
// fastq_screen_args are best passed in to the workflow in the following manner:
// --fastq_screen_args="--subset 200000 --force"
val fastq_screen_args
val verbose
output:
path "*png", emit: png
path "*html", emit: html
path "*txt", emit: report
script:
println(name)
println(reads)
println(outputdir)
if (verbose){
println ("[MODULE] FASTQ SCREEN ARGS: "+ fastq_screen_args)
}
"""
module load fastq_screen
fastq_screen $fastq_screen_args $reads
"""
}

View file

@ -1,31 +0,0 @@
name: FastQ Screen
description: Run FastQ Screen on sequenced reads for Species Identification
keywords:
- Quality Control
- Species Screen
- Contamination
tools:
- fastqc:
description: |
FastQ Screen allows you to screen a library of sequences in
FastQ format against a set of sequence databases so you can
see if the composition of the library matches with what you expect.
homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/
documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/_build/html/index.html
input:
-
- sample_id:
type: string
description: Sample identifier
- reads:
type: file
description: Input FastQ file
output:
-
- report:
type: file
description: FastQ Screen report
pattern: "*_screen.{txt,html,png}"
optional_pattern: "*_screen.bisulfite_orientation.png"
authors:
- "@FelixKrueger"

View file

@ -1 +0,0 @@
../../../../tests/data/fastq/rna/test_R1.fastq.gz

View file

@ -1 +0,0 @@
../../../../tests/data/fastq/rna/test_R1_val_1.fq.gz

View file

@ -1 +0,0 @@
../../../../tests/data/fastq/rna/test_R2.fastq.gz

View file

@ -1 +0,0 @@
../../../../tests/data/fastq/rna/test_R2_val_2.fq.gz

View file

@ -1 +0,0 @@
../../../../tests/data/fastq/rna/test_single_end.fastq.gz

View file

@ -1,30 +0,0 @@
#!/usr/bin/env nextflow
nextflow.preview.dsl = 2
params.outdir = "."
params.fastq_screen_args = ''
// fastq_screen_args are best passed in to the workflow in the following manner:
// --fastq_screen_args="--subset 200000 --force"
params.verbose = false
if (params.verbose){
println ("[WORKFLOW] FASTQ SCREEN ARGS ARE: " + params.fastq_screen_args)
}
// TODO: include '../../../tests/functions/check_process_outputs.nf'
include '../main.nf'
// Define input channels
ch_read_files = Channel
.fromFilePairs('../../../test-datasets/Ecoli*{1,2}.fastq.gz',size:-1)
// .view() // to check whether the input channel works
// Run the workflow
workflow {
main:
FASTQ_SCREEN(ch_read_files, params.outdir, params.fastq_screen_args, params.verbose)
// TODO .check_output()
}

View file

@ -1,2 +0,0 @@
// docker.enabled = true
params.outdir = './results'

View file

@ -1,31 +0,0 @@
#Fastq_screen version: 0.14.0 #Aligner: bowtie2 #Reads in subset: 100000
Genome #Reads_processed #Unmapped %Unmapped #One_hit_one_genome %One_hit_one_genome #Multiple_hits_one_genome %Multiple_hits_one_genome #One_hit_multiple_genomes %One_hit_multiple_genomes Multiple_hits_multiple_genomes %Multiple_hits_multiple_genomes
Cat 10000 9171 91.71 0 0.00 0 0.00 421 4.21 408 4.08
Chicken 10000 8932 89.32 0 0.00 0 0.00 64 0.64 1004 10.04
Cow 10000 8484 84.84 0 0.00 0 0.00 294 2.94 1222 12.22
Drosophila 10000 9469 94.69 0 0.00 0 0.00 19 0.19 512 5.12
Human 10000 8367 83.67 2 0.02 3 0.03 354 3.54 1274 12.74
Mouse 10000 122 1.22 3265 32.65 869 8.69 2066 20.66 3678 36.78
Pig 10000 8459 84.59 0 0.00 0 0.00 334 3.34 1207 12.07
Rat 10000 6432 64.32 1 0.01 3 0.03 1334 13.34 2230 22.30
Zebrafish 10000 9125 91.25 0 0.00 0 0.00 41 0.41 834 8.34
Arabidopsis 10000 9497 94.97 0 0.00 0 0.00 5 0.05 498 4.98
Grape 10000 9600 96.00 0 0.00 1 0.01 82 0.82 317 3.17
Potato 10000 9460 94.60 0 0.00 0 0.00 12 0.12 528 5.28
Tomato 10000 9521 95.21 0 0.00 0 0.00 45 0.45 434 4.34
Adapters 10000 10000 100.00 0 0.00 0 0.00 0 0.00 0 0.00
Brachybacterium 10000 10000 100.00 0 0.00 0 0.00 0 0.00 0 0.00
Pseudomonas 10000 10000 100.00 0 0.00 0 0.00 0 0.00 0 0.00
Massilia_oculi 10000 9999 99.99 0 0.00 1 0.01 0 0.00 0 0.00
Ecoli 10000 9998 99.98 1 0.01 1 0.01 0 0.00 0 0.00
Lambda 10000 10000 100.00 0 0.00 0 0.00 0 0.00 0 0.00
MT 10000 7856 78.56 0 0.00 0 0.00 2034 20.34 110 1.10
PhiX 10000 10000 100.00 0 0.00 0 0.00 0 0.00 0 0.00
rRNA 10000 9157 91.57 0 0.00 0 0.00 111 1.11 732 7.32
Wasp 10000 9473 94.73 0 0.00 0 0.00 211 2.11 316 3.16
Vectors 10000 9713 97.13 0 0.00 0 0.00 52 0.52 235 2.35
Worm 10000 9645 96.45 0 0.00 0 0.00 13 0.13 342 3.42
Yeast 10000 9507 95.07 0 0.00 0 0.00 4 0.04 489 4.89
Mycoplasma 10000 9998 99.98 0 0.00 0 0.00 0 0.00 2 0.02
%Hit_no_genomes: 0.88

View file

@ -1,34 +0,0 @@
import java.security.MessageDigest
private static String getMD5(File file) throws IOException
{
// https://howtodoinjava.com/java/io/how-to-generate-sha-or-md5-file-checksum-hash-in-java/
//Get file input stream for reading the file content
FileInputStream fis = new FileInputStream(file);
//Create byte array to read data in chunks
byte[] byteArray = new byte[1024];
int bytesCount = 0;
//Read file data and update in message digest
def digest = MessageDigest.getInstance("MD5")
while ((bytesCount = fis.read(byteArray)) != -1) {
digest.update(byteArray, 0, bytesCount);
};
//close the stream; We don't need it now.
fis.close();
//Get the hash's bytes
byte[] bytes = digest.digest();
//This bytes[] has bytes in decimal format;
//Convert it to hexadecimal format
StringBuilder sb = new StringBuilder();
for(int i=0; i< bytes.length ;i++)
{
sb.append(Integer.toString((bytes[i] & 0xff) + 0x100, 16).substring(1));
}
//return complete hash
return sb.toString();
}

View file

@ -1,16 +0,0 @@
process tcoffee {
tag "$fasta"
publishDir "${params.outdir}/tcoffee"
container 'quay.io/biocontainers/t_coffee:11.0.8--py27pl5.22.0_5'
input:
path "$fasta"
output:
path "${fasta}.aln"
script:
"""
t_coffee -seq $fasta -outfile ${fasta}.aln
"""
}

View file

@ -1,28 +0,0 @@
name: t-coffee
description: Run tcofee multiple sequence alignment
keywords:
- MSA
- sequence aligment
tools:
- t-coffee:
description: |
T-Coffee is a multiple sequence alignment package.
It uses a progressive approach and a consistency objective
function for alignment evaluation.
homepage: http://www.tcoffee.org/
documentation: http://www.tcoffee.org/Projects/tcoffee/index.html#DOCUMENTATION
input:
-
- fasta:
type: path
description: Input fasta file
pattern: "*.{fasta,fa,tfa}"
output:
-
- alignment:
type: file
description: tcoffee alignment file
pattern: "*.aln"
authors:
- "@JoseEspinosa"

View file

@ -1,15 +0,0 @@
#!/usr/bin/env nextflow
nextflow.preview.dsl = 2
include check_output from '../../../tests/functions/check_process_outputs.nf'
include tcoffee from '../main.nf'
// Define input channels
fasta = Channel.fromPath('../../../test-datasets/tools/tcoffee/input/BBA0001.tfa')
// Run the workflow
workflow {
tcoffee(fasta)
// .check_output()
}

View file

@ -1,2 +0,0 @@
docker.enabled = true
params.outdir = './results'

View file

@ -0,0 +1,55 @@
// Import generic module functions
include { initOptions; saveFiles; getSoftwareName } from './functions'
params.options = [:]
options = initOptions(params.options)
process BEDTOOLS_GENOMECOV {
tag "$meta.id"
label 'process_medium'
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) }
conda (params.enable_conda ? "bioconda::bedtools=2.30.0" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/bedtools:2.30.0--hc088bd4_0"
} else {
container "quay.io/biocontainers/bedtools:2.30.0--hc088bd4_0"
}
input:
tuple val(meta), path(intervals)
path sizes
val extension
output:
tuple val(meta), path("*.${extension}"), emit: genomecov
path "*.version.txt" , emit: version
script:
def software = getSoftwareName(task.process)
def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
if (intervals.name =~ /\.bam/) {
"""
bedtools \\
genomecov \\
-ibam $intervals \\
$options.args \\
> ${prefix}.${extension}
bedtools --version | sed -e "s/bedtools v//g" > ${software}.version.txt
"""
} else {
"""
bedtools \\
genomecov \\
-i $intervals \\
-g $sizes \\
$options.args \\
> ${prefix}.${extension}
bedtools --version | sed -e "s/bedtools v//g" > ${software}.version.txt
"""
}
}

View file

@ -15,20 +15,26 @@ input:
description: | description: |
Groovy Map containing sample information Groovy Map containing sample information
e.g. [ id:'test', single_end:false ] e.g. [ id:'test', single_end:false ]
- bam: - intervals:
type: file type: file
description: Input BAM file description: BAM/BED/GFF/VCF
pattern: "*.{bam}" pattern: "*.{bam|bed|gff|vcf}"
- sizes:
type: file
description: Tab-delimited table of chromosome names in the first column and chromosome sizes in the second column
- extension:
type: string
description: Extension of the output file (e. g., ".bg", ".bedgraph", ".txt", ".tab", etc.) It is set arbitrarily by the user and corresponds to the file format which depends on arguments.
output: output:
- meta: - meta:
type: map type: map
description: | description: |
Groovy Map containing sample information Groovy Map containing sample information
e.g. [ id:'test', single_end:false ] e.g. [ id:'test', single_end:false ]
- bed: - genomecov:
type: file type: file
description: Computed genomecov bed file description: Computed genome coverage file
pattern: "*.{bed}" pattern: "*.${extension}"
- version: - version:
type: file type: file
description: File containing software version description: File containing software version
@ -37,3 +43,4 @@ authors:
- "@Emiller88" - "@Emiller88"
- "@sruthipsuresh" - "@sruthipsuresh"
- "@drpatelh" - "@drpatelh"
- "@sidorov-si"

View file

@ -19,10 +19,11 @@ process BEDTOOLS_INTERSECT {
} }
input: input:
tuple val(meta), path(bed1), path(bed2) tuple val(meta), path(intervals1), path(intervals2)
val extension
output: output:
tuple val(meta), path('*.bed'), emit: bed tuple val(meta), path("*.${extension}"), emit: intersect
path '*.version.txt' , emit: version path '*.version.txt' , emit: version
script: script:
@ -31,10 +32,10 @@ process BEDTOOLS_INTERSECT {
""" """
bedtools \\ bedtools \\
intersect \\ intersect \\
-a $bed1 \\ -a $intervals1 \\
-b $bed2 \\ -b $intervals2 \\
$options.args \\ $options.args \\
> ${prefix}.bed > ${prefix}.${extension}
bedtools --version | sed -e "s/bedtools v//g" > ${software}.version.txt bedtools --version | sed -e "s/bedtools v//g" > ${software}.version.txt
""" """

View file

@ -1,5 +1,5 @@
name: bedtools_intersect name: bedtools_intersect
description: allows one to screen for overlaps between two sets of genomic features. description: Allows one to screen for overlaps between two sets of genomic features.
keywords: keywords:
- bed - bed
- intersect - intersect
@ -14,24 +14,27 @@ input:
description: | description: |
Groovy Map containing sample information Groovy Map containing sample information
e.g. [ id:'test', single_end:false ] e.g. [ id:'test', single_end:false ]
- bed1: - intervals1:
type: file type: file
description: BED file, each feature in 1 is compared to 2 in search of overlaps description: BAM/BED/GFF/VCF
pattern: "*.{bed}" pattern: "*.{bam|bed|gff|vcf}"
- bed2: - intervals2:
type: file type: file
description: Second bed file, used to compare to first BED file description: BAM/BED/GFF/VCF
pattern: "*.{bed}" pattern: "*.{bam|bed|gff|vcf}"
- extension:
type: value
description: Extension of the output file. It is set by the user and corresponds to the file format which depends on arguments (e. g., ".bed", ".bam", ".txt", etc.).
output: output:
- meta: - meta:
type: map type: map
description: | description: |
Groovy Map containing sample information Groovy Map containing sample information
e.g. [ id:'test', single_end:false ] e.g. [ id:'test', single_end:false ]
- bed: - intersect:
type: file type: file
description: BED file with intersected intervals description: File containing the description of overlaps found between the two features
pattern: "*.{bed}" pattern: "*.${extension}"
- version: - version:
type: file type: file
description: File containing software version description: File containing software version
@ -40,3 +43,4 @@ authors:
- "@Emiller88" - "@Emiller88"
- "@sruthipsuresh" - "@sruthipsuresh"
- "@drpatelh" - "@drpatelh"
- "@sidorov-si"

View file

@ -4,7 +4,7 @@ include { initOptions; saveFiles; getSoftwareName } from './functions'
params.options = [:] params.options = [:]
options = initOptions(params.options) options = initOptions(params.options)
process BEDTOOLS_GENOMECOV { process BEDTOOLS_SUBTRACT {
tag "$meta.id" tag "$meta.id"
label 'process_medium' label 'process_medium'
publishDir "${params.outdir}", publishDir "${params.outdir}",
@ -19,7 +19,7 @@ process BEDTOOLS_GENOMECOV {
} }
input: input:
tuple val(meta), path(bam) tuple val(meta), path(intervals1), path(intervals2)
output: output:
tuple val(meta), path("*.bed"), emit: bed tuple val(meta), path("*.bed"), emit: bed
@ -30,8 +30,9 @@ process BEDTOOLS_GENOMECOV {
def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
""" """
bedtools \\ bedtools \\
genomecov \\ subtract \\
-ibam $bam \\ -a $intervals1 \\
-b $intervals2 \\
$options.args \\ $options.args \\
> ${prefix}.bed > ${prefix}.bed

View file

@ -0,0 +1,45 @@
name: bedtools_subtract
description: Finds overlaps between two sets of regions (A and B), removes the overlaps from A and reports the remaining portion of A.
keywords:
- bed
- gff
- vcf
- subtract
tools:
- bedtools:
description: |
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
documentation: https://bedtools.readthedocs.io/en/latest/content/tools/subtract.html
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- intervals1:
type: file
description: BED/GFF/VCF
pattern: "*.{bed|gff|vcf}"
- intervals2:
type: file
description: BED/GFF/VCF
pattern: "*.{bed|gff|vcf}"
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bed:
type: file
description: File containing the difference between the two sets of features
patters: "*.bed"
- version:
type: file
description: File containing software version
pattern: "*.{version.txt}"
authors:
- "@sidorov-si"

Some files were not shown because too many files have changed in this diff Show more