initial template build from nf-core/tools, version 2.2

master
James Fellows Yates 2 years ago
commit 12e7b428f2

@ -0,0 +1,27 @@
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
indent_size = 4
indent_style = space
[*.{yml,yaml}]
indent_size = 2
[*.json]
insert_final_newline = unset
# These files are edited and tested upstream in nf-core/modules
[/modules/nf-core/**]
charset = unset
end_of_line = unset
insert_final_newline = unset
trim_trailing_whitespace = unset
indent_style = unset
indent_size = unset
[/assets/email*]
indent_size = unset

3
.gitattributes vendored

@ -0,0 +1,3 @@
*.config linguist-language=nextflow
modules/nf-core/** linguist-generated
subworkflows/nf-core/** linguist-generated

@ -0,0 +1,6 @@
# Dockstore config version, not pipeline version
version: 1.2
workflows:
- subclass: nfl
primaryDescriptorPath: /nextflow.config
publish: True

@ -0,0 +1,104 @@
# nf-core/taxprofiler: Contributing Guidelines
Hi there!
Many thanks for taking an interest in improving nf-core/taxprofiler.
We try to manage the required tasks for nf-core/taxprofiler using GitHub issues, you probably came to this page when creating one.
Please use the pre-filled template to save time.
However, don't be put off by this template - other more general issues and suggestions are welcome!
Contributions to the code are even more welcome ;)
> If you need help using or modifying nf-core/taxprofiler then the best place to ask is on the nf-core Slack [#taxprofiler](https://nfcore.slack.com/channels/taxprofiler) channel ([join our Slack here](https://nf-co.re/join/slack)).
## Contribution workflow
If you'd like to write some code for nf-core/taxprofiler, the standard workflow is as follows:
1. Check that there isn't already an issue about your idea in the [nf-core/taxprofiler issues](https://github.com/nf-core/taxprofiler/issues) to avoid duplicating work
* If there isn't one already, please create one so that others know you're working on this
2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/taxprofiler repository](https://github.com/nf-core/taxprofiler) to your GitHub account
3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)
4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/).
## Tests
When you create a pull request with changes, [GitHub Actions](https://github.com/features/actions) will run automatic tests.
Typically, pull-requests are only fully reviewed when these tests are passing, though of course we can help out before then.
There are typically two types of tests that run:
### Lint tests
`nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to.
To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint <pipeline-directory>` command.
If any failures or warnings are encountered, please follow the listed URL for more documentation.
### Pipeline tests
Each `nf-core` pipeline should be set up with a minimal set of test-data.
`GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully.
If there are any failures then the automated tests fail.
These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code.
## Patch
:warning: Only in the unlikely and regretful event of a release happening with a bug.
* On your own fork, make a new branch `patch` based on `upstream/master`.
* Fix the bug, and bump version (X.Y.Z+1).
* A PR should be made on `master` from patch to directly this particular bug.
## Getting help
For further information/help, please consult the [nf-core/taxprofiler documentation](https://nf-co.re/taxprofiler/usage) and don't hesitate to get in touch on the nf-core Slack [#taxprofiler](https://nfcore.slack.com/channels/taxprofiler) channel ([join our Slack here](https://nf-co.re/join/slack)).
## Pipeline contribution conventions
To make the nf-core/taxprofiler code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written.
### Adding a new step
If you wish to contribute a new step, please use the following coding standards:
1. Define the corresponding input channel into your new process from the expected previous process channel
2. Write the process block (see below).
3. Define the output channel if needed (see below).
4. Add any new parameters to `nextflow.config` with a default (see below).
5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool).
6. Add sanity checks and validation for all relevant parameters.
7. Perform local tests to validate that the new code works as expected.
8. If applicable, add a new test command in `.github/workflow/ci.yml`.
9. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://https://multiqc.info/) module.
10. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`.
### Default values
Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope.
Once there, use `nf-core schema build` to add to `nextflow_schema.json`.
### Default processes resource requirements
Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.
### Naming schemes
Please use the following naming schemes, to make it easy to understand what is going where.
* initial process channel: `ch_output_from_<process>`
* intermediate and terminal channels: `ch_<previousprocess>_for_<nextprocess>`
### Nextflow version bumping
If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]`
### Images and figures
For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines).

@ -0,0 +1,52 @@
name: Bug report
description: Report something that is broken or incorrect
labels: bug
body:
- type: markdown
attributes:
value: |
Before you post this issue, please check the documentation:
- [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting)
- [nf-core/taxprofiler pipeline documentation](https://nf-co.re/taxprofiler/usage)
- type: textarea
id: description
attributes:
label: Description of the bug
description: A clear and concise description of what the bug is.
validations:
required: true
- type: textarea
id: command_used
attributes:
label: Command used and terminal output
description: Steps to reproduce the behaviour. Please paste the command you used to launch the pipeline and the output from your terminal.
render: console
placeholder: |
$ nextflow run ...
Some output where something broke
- type: textarea
id: files
attributes:
label: Relevant files
description: |
Please drag and drop the relevant files here. Create a `.zip` archive if the extension is not allowed.
Your verbose log file `.nextflow.log` is often useful _(this is a hidden file in the directory where you launched the pipeline)_ as well as custom Nextflow configuration files.
- type: textarea
id: system
attributes:
label: System information
description: |
* Nextflow version _(eg. 21.10.3)_
* Hardware _(eg. HPC, Desktop, Cloud)_
* Executor _(eg. slurm, local, awsbatch)_
* Container engine: _(e.g. Docker, Singularity, Conda, Podman, Shifter or Charliecloud)_
* OS _(eg. CentOS Linux, macOS, Linux Mint)_
* Version of nf-core/taxprofiler _(eg. 1.1, 1.5, 1.8.2)_

@ -0,0 +1,7 @@
contact_links:
- name: Join nf-core
url: https://nf-co.re/join
about: Please join the nf-core community here
- name: "Slack #taxprofiler channel"
url: https://nfcore.slack.com/channels/taxprofiler
about: Discussion about the nf-core/taxprofiler pipeline

@ -0,0 +1,11 @@
name: Feature request
description: Suggest an idea for the nf-core/taxprofiler pipeline
labels: enhancement
body:
- type: textarea
id: description
attributes:
label: Description of feature
description: Please describe your suggestion for a new feature. It might help to describe a problem or use case, plus any alternatives that you have considered.
validations:
required: true

@ -0,0 +1,26 @@
<!--
# nf-core/taxprofiler pull request
Many thanks for contributing to nf-core/taxprofiler!
Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).
Remember that PRs should be made against the dev branch, unless you're preparing a pipeline release.
Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/taxprofiler/tree/master/.github/CONTRIBUTING.md)
-->
<!-- markdownlint-disable ul-indent -->
## PR checklist
- [ ] This comment contains a description of changes (with reason).
- [ ] If you've fixed a bug or added code that should be tested, add tests!
- [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/taxprofiler/tree/master/.github/CONTRIBUTING.md)
- [ ] If necessary, also make a PR on the nf-core/taxprofiler _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
- [ ] Make sure your code lints (`nf-core lint`).
- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
- [ ] Usage Documentation in `docs/usage.md` is updated.
- [ ] Output Documentation in `docs/output.md` is updated.
- [ ] `CHANGELOG.md` is updated.
- [ ] `README.md` is updated (including new tool citations and authors/contributors).

@ -0,0 +1,34 @@
name: nf-core AWS full size tests
# This workflow is triggered on published releases.
# It can be additionally triggered manually with GitHub actions workflow dispatch button.
# It runs the -profile 'test_full' on AWS batch
on:
release:
types: [published]
workflow_dispatch:
jobs:
run-tower:
name: Run AWS full tests
if: github.repository == 'nf-core/taxprofiler'
runs-on: ubuntu-latest
steps:
- name: Launch workflow via tower
uses: nf-core/tower-action@v2
# TODO nf-core: You can customise AWS full pipeline tests as required
# Add full size test data (but still relatively small datasets for few samples)
# on the `test_full.config` test runs with only one set of parameters
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
pipeline: ${{ github.repository }}
revision: ${{ github.sha }}
workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/taxprofiler/work-${{ github.sha }}
parameters: |
{
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/taxprofiler/results-${{ github.sha }}"
}
profiles: test_full,aws_tower
pre_run_script: 'export NXF_VER=21.10.3'

@ -0,0 +1,28 @@
name: nf-core AWS test
# This workflow can be triggered manually with the GitHub actions workflow dispatch button.
# It runs the -profile 'test' on AWS batch
on:
workflow_dispatch:
jobs:
run-tower:
name: Run AWS tests
if: github.repository == 'nf-core/taxprofiler'
runs-on: ubuntu-latest
steps:
- name: Launch workflow via tower
uses: nf-core/tower-action@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
pipeline: ${{ github.repository }}
revision: ${{ github.sha }}
workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/taxprofiler/work-${{ github.sha }}
parameters: |
{
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/taxprofiler/results-test-${{ github.sha }}"
}
profiles: test,aws_tower
pre_run_script: 'export NXF_VER=21.10.3'

@ -0,0 +1,46 @@
name: nf-core branch protection
# This workflow is triggered on PRs to master branch on the repository
# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`
on:
pull_request_target:
branches: [master]
jobs:
test:
runs-on: ubuntu-latest
steps:
# PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
- name: Check PRs
if: github.repository == 'nf-core/taxprofiler'
run: |
{ [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/taxprofiler ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
# If the above check failed, post a comment on the PR explaining the failure
# NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets
- name: Post PR comment
if: failure()
uses: mshick/add-pr-comment@v1
with:
message: |
## This PR is against the `master` branch :x:
* Do not close this PR
* Click _Edit_ and change the `base` to `dev`
* This CI test will remain failed until you push a new commit
---
Hi @${{ github.event.pull_request.user.login }},
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
The `master` branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.
Thanks again for your contribution!
repo-token: ${{ secrets.GITHUB_TOKEN }}
allow-repeats: false

@ -0,0 +1,50 @@
name: nf-core CI
# This workflow runs the pipeline with the minimal test dataset to check that it completes without any syntax errors
on:
push:
branches:
- dev
pull_request:
release:
types: [published]
env:
NXF_ANSI_LOG: false
CAPSULE_LOG: none
jobs:
test:
name: Run workflow tests
# Only run on push if this is the nf-core dev branch (merged PRs)
if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/taxprofiler') }}
runs-on: ubuntu-latest
strategy:
matrix:
# Nextflow versions
include:
# Test pipeline minimum Nextflow version
- NXF_VER: '21.10.3'
NXF_EDGE: ''
# Test latest edge release of Nextflow
- NXF_VER: ''
NXF_EDGE: '1'
steps:
- name: Check out pipeline code
uses: actions/checkout@v2
- name: Install Nextflow
env:
NXF_VER: ${{ matrix.NXF_VER }}
# Uncomment only if the edge release is more recent than the latest stable release
# See https://github.com/nextflow-io/nextflow/issues/2467
# NXF_EDGE: ${{ matrix.NXF_EDGE }}
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Run pipeline with test data
# TODO nf-core: You can customise CI pipeline run tests as required
# For example: adding multiple test runs with different parameters
# Remember that you can parallelise this by using strategy.matrix
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker

@ -0,0 +1,145 @@
name: nf-core linting
# This workflow is triggered on pushes and PRs to the repository.
# It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines
on:
push:
pull_request:
release:
types: [published]
jobs:
Markdown:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: '10'
- name: Install markdownlint
run: npm install -g markdownlint-cli
- name: Run Markdownlint
run: markdownlint .
# If the above check failed, post a comment on the PR explaining the failure
- name: Post PR comment
if: failure()
uses: mshick/add-pr-comment@v1
with:
message: |
## Markdown linting is failing
To keep the code consistent with lots of contributors, we run automated code consistency checks.
To fix this CI test, please run:
* Install `markdownlint-cli`
* On Mac: `brew install markdownlint-cli`
* Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`)
* Fix the markdown errors
* Automatically: `markdownlint . --fix`
* Manually resolve anything left from `markdownlint .`
Once you push these changes the test should pass, and you can hide this comment :+1:
We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!
Thanks again for your contribution!
repo-token: ${{ secrets.GITHUB_TOKEN }}
allow-repeats: false
EditorConfig:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: '10'
- name: Install editorconfig-checker
run: npm install -g editorconfig-checker
- name: Run ECLint check
run: editorconfig-checker -exclude README.md $(git ls-files | grep -v test)
YAML:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- uses: actions/setup-node@v1
with:
node-version: '10'
- name: Install yaml-lint
run: npm install -g yaml-lint
- name: Run yaml-lint
run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml" -o -name "*.yaml")
# If the above check failed, post a comment on the PR explaining the failure
- name: Post PR comment
if: failure()
uses: mshick/add-pr-comment@v1
with:
message: |
## YAML linting is failing
To keep the code consistent with lots of contributors, we run automated code consistency checks.
To fix this CI test, please run:
* Install `yaml-lint`
* [Install `npm`](https://www.npmjs.com/get-npm) then [install `yaml-lint`](https://www.npmjs.com/package/yaml-lint) (`npm install -g yaml-lint`)
* Fix the markdown errors
* Run the test locally: `yamllint $(find . -type f -name "*.yml" -o -name "*.yaml")`
* Fix any reported errors in your YAML files
Once you push these changes the test should pass, and you can hide this comment :+1:
We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!
Thanks again for your contribution!
repo-token: ${{ secrets.GITHUB_TOKEN }}
allow-repeats: false
nf-core:
runs-on: ubuntu-latest
steps:
- name: Check out pipeline code
uses: actions/checkout@v2
- name: Install Nextflow
env:
CAPSULE_LOG: none
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- uses: actions/setup-python@v1
with:
python-version: '3.6'
architecture: 'x64'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install nf-core
- name: Run nf-core lint
env:
GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }}
run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md
- name: Save PR number
if: ${{ always() }}
run: echo ${{ github.event.pull_request.number }} > PR_number.txt
- name: Upload linting log file artifact
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: linting-logs
path: |
lint_log.txt
lint_results.md
PR_number.txt

@ -0,0 +1,30 @@
name: nf-core linting comment
# This workflow is triggered after the linting action is complete
# It posts an automated comment to the PR, even if the PR is coming from a fork
on:
workflow_run:
workflows: ["nf-core linting"]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Download lint results
uses: dawidd6/action-download-artifact@v2
with:
workflow: linting.yml
workflow_conclusion: completed
- name: Get PR number
id: pr_number
run: echo "::set-output name=pr_number::$(cat linting-logs/PR_number.txt)"
- name: Post PR comment
uses: marocchino/sticky-pull-request-comment@v2
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
number: ${{ steps.pr_number.outputs.pr_number }}
path: linting-logs/lint_results.md

8
.gitignore vendored

@ -0,0 +1,8 @@
.nextflow*
work/
data/
results/
.DS_Store
testing/
testing*
*.pyc

@ -0,0 +1,14 @@
# Markdownlint configuration file
default: true
line-length: false
ul-indent:
indent: 4
no-duplicate-header:
siblings_only: true
no-inline-html:
allowed_elements:
- img
- p
- kbd
- details
- summary

@ -0,0 +1,16 @@
# nf-core/taxprofiler: Changelog
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## v1.0dev - [date]
Initial release of nf-core/taxprofiler, created with the [nf-core](https://nf-co.re/) template.
### `Added`
### `Fixed`
### `Dependencies`
### `Deprecated`

@ -0,0 +1,32 @@
# nf-core/taxprofiler: Citations
## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)
> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)
> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
## Pipeline tools
* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
* [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
## Software packaging/containerisation tools
* [Anaconda](https://anaconda.com)
> Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.
* [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)
> Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
* [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)
> da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)
* [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

@ -0,0 +1,111 @@
# Code of Conduct at nf-core (v1.0)
## Our Pledge
In the interest of fostering an open, collaborative, and welcoming environment, we as contributors and maintainers of nf-core, pledge to making participation in our projects and community a harassment-free experience for everyone, regardless of:
- Age
- Body size
- Familial status
- Gender identity and expression
- Geographical location
- Level of experience
- Nationality and national origins
- Native language
- Physical and neurological ability
- Race or ethnicity
- Religion
- Sexual identity and orientation
- Socioeconomic status
Please note that the list above is alphabetised and is therefore not ranked in any order of preference or importance.
## Preamble
> Note: This Code of Conduct (CoC) has been drafted by the nf-core Safety Officer and been edited after input from members of the nf-core team and others. "We", in this document, refers to the Safety Officer and members of the nf-core core team, both of whom are deemed to be members of the nf-core community and are therefore required to abide by this Code of Conduct. This document will amended periodically to keep it up-to-date, and in case of any dispute, the most current version will apply.
An up-to-date list of members of the nf-core core team can be found [here](https://nf-co.re/about). Our current safety officer is Renuka Kudva.
nf-core is a young and growing community that welcomes contributions from anyone with a shared vision for [Open Science Policies](https://www.fosteropenscience.eu/taxonomy/term/8). Open science policies encompass inclusive behaviours and we strive to build and maintain a safe and inclusive environment for all individuals.
We have therefore adopted this code of conduct (CoC), which we require all members of our community and attendees in nf-core events to adhere to in all our workspaces at all times. Workspaces include but are not limited to Slack, meetings on Zoom, Jitsi, YouTube live etc.
Our CoC will be strictly enforced and the nf-core team reserve the right to exclude participants who do not comply with our guidelines from our workspaces and future nf-core activities.
We ask all members of our community to help maintain a supportive and productive workspace and to avoid behaviours that can make individuals feel unsafe or unwelcome. Please help us maintain and uphold this CoC.
Questions, concerns or ideas on what we can include? Contact safety [at] nf-co [dot] re
## Our Responsibilities
The safety officer is responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behaviour.
The safety officer in consultation with the nf-core core team have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
Members of the core team or the safety officer who violate the CoC will be required to recuse themselves pending investigation. They will not have access to any reports of the violations and be subject to the same actions as others in violation of the CoC.
## When are where does this Code of Conduct apply?
Participation in the nf-core community is contingent on following these guidelines in all our workspaces and events. This includes but is not limited to the following listed alphabetically and therefore in no order of preference:
- Communicating with an official project email address.
- Communicating with community members within the nf-core Slack channel.
- Participating in hackathons organised by nf-core (both online and in-person events).
- Participating in collaborative work on GitHub, Google Suite, community calls, mentorship meetings, email correspondence.
- Participating in workshops, training, and seminar series organised by nf-core (both online and in-person events). This applies to events hosted on web-based platforms such as Zoom, Jitsi, YouTube live etc.
- Representing nf-core on social media. This includes both official and personal accounts.
## nf-core cares 😊
nf-core's CoC and expectations of respectful behaviours for all participants (including organisers and the nf-core team) include but are not limited to the following (listed in alphabetical order):
- Ask for consent before sharing another community members personal information (including photographs) on social media.
- Be respectful of differing viewpoints and experiences. We are all here to learn from one another and a difference in opinion can present a good learning opportunity.
- Celebrate your accomplishments at events! (Get creative with your use of emojis 🎉 🥳 💯 🙌 !)
- Demonstrate empathy towards other community members. (We dont all have the same amount of time to dedicate to nf-core. If tasks are pending, dont hesitate to gently remind members of your team. If you are leading a task, ask for help if you feel overwhelmed.)
- Engage with and enquire after others. (This is especially important given the geographically remote nature of the nf-core community, so lets do this the best we can)
- Focus on what is best for the team and the community. (When in doubt, ask)
- Graciously accept constructive criticism, yet be unafraid to question, deliberate, and learn.
- Introduce yourself to members of the community. (Weve all been outsiders and we know that talking to strangers can be hard for some, but remember were interested in getting to know you and your visions for open science!)
- Show appreciation and **provide clear feedback**. (This is especially important because we dont see each other in person and it can be harder to interpret subtleties. Also remember that not everyone understands a certain language to the same extent as you do, so **be clear in your communications to be kind.**)
- Take breaks when you feel like you need them.
- Using welcoming and inclusive language. (Participants are encouraged to display their chosen pronouns on Zoom or in communication on Slack.)
## nf-core frowns on 😕
The following behaviours from any participants within the nf-core community (including the organisers) will be considered unacceptable under this code of conduct. Engaging or advocating for any of the following could result in expulsion from nf-core workspaces.
- Deliberate intimidation, stalking or following and sustained disruption of communication among participants of the community. This includes hijacking shared screens through actions such as using the annotate tool in conferencing software such as Zoom.
- “Doxing” i.e. posting (or threatening to post) another persons personal identifying information online.
- Spamming or trolling of individuals on social media.
- Use of sexual or discriminatory imagery, comments, or jokes and unwelcome sexual attention.
- Verbal and text comments that reinforce social structures of domination related to gender, gender identity and expression, sexual orientation, ability, physical appearance, body size, race, age, religion or work experience.
### Online Trolling
The majority of nf-core interactions and events are held online. Unfortunately, holding events online comes with the added issue of online trolling. This is unacceptable, reports of such behaviour will be taken very seriously, and perpetrators will be excluded from activities immediately.
All community members are required to ask members of the group they are working within for explicit consent prior to taking screenshots of individuals during video calls.
## Procedures for Reporting CoC violations
If someone makes you feel uncomfortable through their behaviours or actions, report it as soon as possible.
You can reach out to members of the [nf-core core team](https://nf-co.re/about) and they will forward your concerns to the safety officer(s).
Issues directly concerning members of the core team will be dealt with by other members of the core team and the safety manager, and possible conflicts of interest will be taken into account. nf-core is also in discussions about having an ombudsperson, and details will be shared in due course.
All reports will be handled with utmost discretion and confidentially.
## Attribution and Acknowledgements
- The [Contributor Covenant, version 1.4](http://contributor-covenant.org/version/1/4)
- The [OpenCon 2017 Code of Conduct](http://www.opencon2017.org/code_of_conduct) (CC BY 4.0 OpenCon organisers, SPARC and Right to Research Coalition)
- The [eLife innovation sprint 2020 Code of Conduct](https://sprint.elifesciences.org/code-of-conduct/)
- The [Mozilla Community Participation Guidelines v3.1](https://www.mozilla.org/en-US/about/governance/policies/participation/) (version 3.1, CC BY-SA 3.0 Mozilla)
## Changelog
### v1.0 - March 12th, 2021
- Complete rewrite from original [Contributor Covenant](http://contributor-covenant.org/) CoC.

@ -0,0 +1,21 @@
MIT License
Copyright (c) nf-core community
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

@ -0,0 +1,93 @@
# ![nf-core/taxprofiler](docs/images/nf-core-taxprofiler_logo_light.png#gh-light-mode-only) ![nf-core/taxprofiler](docs/images/nf-core-taxprofiler_logo_dark.png#gh-dark-mode-only)
[![GitHub Actions CI Status](https://github.com/nf-core/taxprofiler/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/taxprofiler/actions?query=workflow%3A%22nf-core+CI%22)
[![GitHub Actions Linting Status](https://github.com/nf-core/taxprofiler/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/taxprofiler/actions?query=workflow%3A%22nf-core+linting%22)
[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/taxprofiler/results)
[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.10.3-23aa62.svg?labelColor=000000)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23taxprofiler-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/taxprofiler)
[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)
[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
## Introduction
<!-- TODO nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does -->
**nf-core/taxprofiler** is a bioinformatics best-practice analysis pipeline for Taxonomic profiling of shotgun metagenomic data.
The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
<!-- TODO nf-core: Add full-sized test dataset and amend the paragraph below if applicable -->
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/taxprofiler/results).
## Pipeline summary
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->
1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))
## Quick Start
1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.10.3`)
2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_
3. Download the pipeline and test it on a minimal dataset with a single command:
```console
nextflow run nf-core/taxprofiler -profile test,YOURPROFILE
```
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string.
> * The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`.
> * Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
> * If you are using `singularity` and are persistently observing issues downloading Singularity images directly due to timeout or network issues, then you can use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, you can use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
> * If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs.
4. Start running your own analysis!
<!-- TODO nf-core: Update the example "typical command" below used to run the pipeline -->
```console
nextflow run nf-core/taxprofiler -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input samplesheet.csv --genome GRCh37
```
## Documentation
The nf-core/taxprofiler pipeline comes with documentation about the pipeline [usage](https://nf-co.re/taxprofiler/usage), [parameters](https://nf-co.re/taxprofiler/parameters) and [output](https://nf-co.re/taxprofiler/output).
## Credits
nf-core/taxprofiler was originally written by nf-core community.
We thank the following people for their extensive assistance in the development of this pipeline:
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
## Contributions and Support
If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
For further information or help, don't hesitate to get in touch on the [Slack `#taxprofiler` channel](https://nfcore.slack.com/channels/taxprofiler) (you can join with [this invite](https://nf-co.re/join/slack)).
## Citations
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use nf-core/taxprofiler for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
You can cite the `nf-core` publication as follows:
> **The nf-core framework for community-curated bioinformatics pipelines.**
>
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).

@ -0,0 +1,53 @@
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="nf-core/taxprofiler: Taxonomic profiling of shotgun metagenomic data">
<title>nf-core/taxprofiler Pipeline Report</title>
</head>
<body>
<div style="font-family: Helvetica, Arial, sans-serif; padding: 30px; max-width: 800px; margin: 0 auto;">
<img src="cid:nfcorepipelinelogo">
<h1>nf-core/taxprofiler v${version}</h1>
<h2>Run Name: $runName</h2>
<% if (!success){
out << """
<div style="color: #a94442; background-color: #f2dede; border-color: #ebccd1; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
<h4 style="margin-top:0; color: inherit;">nf-core/taxprofiler execution completed unsuccessfully!</h4>
<p>The exit status of the task that caused the workflow execution to fail was: <code>$exitStatus</code>.</p>
<p>The full error message was:</p>
<pre style="white-space: pre-wrap; overflow: visible; margin-bottom: 0;">${errorReport}</pre>
</div>
"""
} else {
out << """
<div style="color: #3c763d; background-color: #dff0d8; border-color: #d6e9c6; padding: 15px; margin-bottom: 20px; border: 1px solid transparent; border-radius: 4px;">
nf-core/taxprofiler execution completed successfully!
</div>
"""
}
%>
<p>The workflow was completed at <strong>$dateComplete</strong> (duration: <strong>$duration</strong>)</p>
<p>The command used to launch the workflow was as follows:</p>
<pre style="white-space: pre-wrap; overflow: visible; background-color: #ededed; padding: 15px; border-radius: 4px; margin-bottom:30px;">$commandLine</pre>
<h3>Pipeline Configuration:</h3>
<table style="width:100%; max-width:100%; border-spacing: 0; border-collapse: collapse; border:0; margin-bottom: 30px;">
<tbody style="border-bottom: 1px solid #ddd;">
<% out << summary.collect{ k,v -> "<tr><th style='text-align:left; padding: 8px 0; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'>$k</th><td style='text-align:left; padding: 8px; line-height: 1.42857143; vertical-align: top; border-top: 1px solid #ddd;'><pre style='white-space: pre-wrap; overflow: visible;'>$v</pre></td></tr>" }.join("\n") %>
</tbody>
</table>
<p>nf-core/taxprofiler</p>
<p><a href="https://github.com/nf-core/taxprofiler">https://github.com/nf-core/taxprofiler</a></p>
</div>
</body>
</html>

@ -0,0 +1,40 @@
----------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~\\
|\\ | |__ __ / ` / \\ |__) |__ } {
| \\| | \\__, \\__/ | \\ |___ \\`-._,-`-,
`._,._,'
nf-core/taxprofiler v${version}
----------------------------------------------------
Run Name: $runName
<% if (success){
out << "## nf-core/taxprofiler execution completed successfully! ##"
} else {
out << """####################################################
## nf-core/taxprofiler execution completed unsuccessfully! ##
####################################################
The exit status of the task that caused the workflow execution to fail was: $exitStatus.
The full error message was:
${errorReport}
"""
} %>
The workflow was completed at $dateComplete (duration: $duration)
The command used to launch the workflow was as follows:
$commandLine
Pipeline Configuration:
-----------------------
<% out << summary.collect{ k,v -> " - $k: $v" }.join("\n") %>
--
nf-core/taxprofiler
https://github.com/nf-core/taxprofiler

@ -0,0 +1,11 @@
report_comment: >
This report has been generated by the <a href="https://github.com/nf-core/taxprofiler" target="_blank">nf-core/taxprofiler</a>
analysis pipeline. For information about how to interpret these results, please see the
<a href="https://nf-co.re/taxprofiler" target="_blank">documentation</a>.
report_section_order:
software_versions:
order: -1000
nf-core-taxprofiler-summary:
order: -1001
export_plots: true

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

@ -0,0 +1,3 @@
sample,fastq_1,fastq_2
SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz
SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,
1 sample fastq_1 fastq_2
2 SAMPLE_PAIRED_END /path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz /path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz
3 SAMPLE_SINGLE_END /path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz

@ -0,0 +1,39 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/taxprofiler/master/assets/schema_input.json",
"title": "nf-core/taxprofiler pipeline - params.input schema",
"description": "Schema for the file provided with params.input",
"type": "array",
"items": {
"type": "object",
"properties": {
"sample": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Sample name must be provided and cannot contain spaces"
},
"fastq_1": {
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
},
"fastq_2": {
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"anyOf": [
{
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz$"
},
{
"type": "string",
"maxLength": 0
}
]
}
},
"required": [
"sample",
"fastq_1"
]
}
}

@ -0,0 +1,53 @@
To: $email
Subject: $subject
Mime-Version: 1.0
Content-Type: multipart/related;boundary="nfcoremimeboundary"
--nfcoremimeboundary
Content-Type: text/html; charset=utf-8
$email_html
--nfcoremimeboundary
Content-Type: image/png;name="nf-core-taxprofiler_logo.png"
Content-Transfer-Encoding: base64
Content-ID: <nfcorepipelinelogo>
Content-Disposition: inline; filename="nf-core-taxprofiler_logo_light.png"
<% out << new File("$projectDir/assets/nf-core-taxprofiler_logo_light.png").
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' ) %>
<%
if (mqcFile){
def mqcFileObj = new File("$mqcFile")
if (mqcFileObj.length() < mqcMaxSize){
out << """
--nfcoremimeboundary
Content-Type: text/html; name=\"multiqc_report\"
Content-Transfer-Encoding: base64
Content-ID: <mqcreport>
Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\"
${mqcFileObj.
bytes.
encodeBase64().
toString().
tokenize( '\n' )*.
toList()*.
collate( 76 )*.
collect { it.join() }.
flatten().
join( '\n' )}
"""
}}
%>
--nfcoremimeboundary--

@ -0,0 +1,146 @@
#!/usr/bin/env python
# TODO nf-core: Update the script to check the samplesheet
# This script is based on the example at: https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv
import os
import sys
import errno
import argparse
def parse_args(args=None):
Description = "Reformat nf-core/taxprofiler samplesheet file and check its contents."
Epilog = "Example usage: python check_samplesheet.py <FILE_IN> <FILE_OUT>"
parser = argparse.ArgumentParser(description=Description, epilog=Epilog)
parser.add_argument("FILE_IN", help="Input samplesheet file.")
parser.add_argument("FILE_OUT", help="Output file.")
return parser.parse_args(args)
def make_dir(path):
if len(path) > 0:
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise exception
def print_error(error, context="Line", context_str=""):
error_str = "ERROR: Please check samplesheet -> {}".format(error)
if context != "" and context_str != "":
error_str = "ERROR: Please check samplesheet -> {}\n{}: '{}'".format(
error, context.strip(), context_str.strip()
)
print(error_str)
sys.exit(1)
# TODO nf-core: Update the check_samplesheet function
def check_samplesheet(file_in, file_out):
"""
This function checks that the samplesheet follows the following structure:
sample,fastq_1,fastq_2
SAMPLE_PE,SAMPLE_PE_RUN1_1.fastq.gz,SAMPLE_PE_RUN1_2.fastq.gz
SAMPLE_PE,SAMPLE_PE_RUN2_1.fastq.gz,SAMPLE_PE_RUN2_2.fastq.gz
SAMPLE_SE,SAMPLE_SE_RUN1_1.fastq.gz,
For an example see:
https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv
"""
sample_mapping_dict = {}
with open(file_in, "r") as fin:
## Check header
MIN_COLS = 2
# TODO nf-core: Update the column names for the input samplesheet
HEADER = ["sample", "fastq_1", "fastq_2"]
header = [x.strip('"') for x in fin.readline().strip().split(",")]
if header[: len(HEADER)] != HEADER:
print("ERROR: Please check samplesheet header -> {} != {}".format(",".join(header), ",".join(HEADER)))
sys.exit(1)
## Check sample entries
for line in fin:
lspl = [x.strip().strip('"') for x in line.strip().split(",")]
# Check valid number of columns per row
if len(lspl) < len(HEADER):
print_error(
"Invalid number of columns (minimum = {})!".format(len(HEADER)),
"Line",
line,
)
num_cols = len([x for x in lspl if x])
if num_cols < MIN_COLS:
print_error(
"Invalid number of populated columns (minimum = {})!".format(MIN_COLS),
"Line",
line,
)
## Check sample name entries
sample, fastq_1, fastq_2 = lspl[: len(HEADER)]
sample = sample.replace(" ", "_")
if not sample:
print_error("Sample entry has not been specified!", "Line", line)
## Check FastQ file extension
for fastq in [fastq_1, fastq_2]:
if fastq:
if fastq.find(" ") != -1:
print_error("FastQ file contains spaces!", "Line", line)
if not fastq.endswith(".fastq.gz") and not fastq.endswith(".fq.gz"):
print_error(
"FastQ file does not have extension '.fastq.gz' or '.fq.gz'!",
"Line",
line,
)
## Auto-detect paired-end/single-end
sample_info = [] ## [single_end, fastq_1, fastq_2]
if sample and fastq_1 and fastq_2: ## Paired-end short reads
sample_info = ["0", fastq_1, fastq_2]
elif sample and fastq_1 and not fastq_2: ## Single-end short reads
sample_info = ["1", fastq_1, fastq_2]
else:
print_error("Invalid combination of columns provided!", "Line", line)
## Create sample mapping dictionary = { sample: [ single_end, fastq_1, fastq_2 ] }
if sample not in sample_mapping_dict:
sample_mapping_dict[sample] = [sample_info]
else:
if sample_info in sample_mapping_dict[sample]:
print_error("Samplesheet contains duplicate rows!", "Line", line)
else:
sample_mapping_dict[sample].append(sample_info)
## Write validated samplesheet with appropriate columns
if len(sample_mapping_dict) > 0:
out_dir = os.path.dirname(file_out)
make_dir(out_dir)
with open(file_out, "w") as fout:
fout.write(",".join(["sample", "single_end", "fastq_1", "fastq_2"]) + "\n")
for sample in sorted(sample_mapping_dict.keys()):
## Check that multiple runs of the same sample are of the same datatype
if not all(x[0] == sample_mapping_dict[sample][0][0] for x in sample_mapping_dict[sample]):
print_error("Multiple runs of a sample must be of the same datatype!", "Sample: {}".format(sample))
for idx, val in enumerate(sample_mapping_dict[sample]):
fout.write(",".join(["{}_T{}".format(sample, idx + 1)] + val) + "\n")
else:
print_error("No entries to process!", "Samplesheet: {}".format(file_in))
def main(args=None):
args = parse_args(args)
check_samplesheet(args.FILE_IN, args.FILE_OUT)
if __name__ == "__main__":
sys.exit(main())

@ -0,0 +1,60 @@
/*
========================================================================================
nf-core/taxprofiler Nextflow base config file
========================================================================================
A 'blank slate' config file, appropriate for general use on most high performance
compute environments. Assumes that all software is installed and available on
the PATH. Runs in `local` mode - all jobs will be run on the logged in environment.
----------------------------------------------------------------------------------------
*/
process {
// TODO nf-core: Check the defaults for all processes
cpus = { check_max( 1 * task.attempt, 'cpus' ) }
memory = { check_max( 6.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'
// Process-specific resource requirements
// NOTE - Please try and re-use the labels below as much as possible.
// These labels are used and recognised by default in DSL2 files hosted on nf-core/modules.
// If possible, it would be nice to keep the same label naming convention when
// adding in your local modules too.
// TODO nf-core: Customise requirements for specific processes.
// See https://www.nextflow.io/docs/latest/config.html#config-process-selectors
withLabel:process_low {
cpus = { check_max( 2 * task.attempt, 'cpus' ) }
memory = { check_max( 12.GB * task.attempt, 'memory' ) }
time = { check_max( 4.h * task.attempt, 'time' ) }
}
withLabel:process_medium {
cpus = { check_max( 6 * task.attempt, 'cpus' ) }
memory = { check_max( 36.GB * task.attempt, 'memory' ) }
time = { check_max( 8.h * task.attempt, 'time' ) }
}
withLabel:process_high {
cpus = { check_max( 12 * task.attempt, 'cpus' ) }
memory = { check_max( 72.GB * task.attempt, 'memory' ) }
time = { check_max( 16.h * task.attempt, 'time' ) }
}
withLabel:process_long {
time = { check_max( 20.h * task.attempt, 'time' ) }
}
withLabel:process_high_memory {
memory = { check_max( 200.GB * task.attempt, 'memory' ) }
}
withLabel:error_ignore {
errorStrategy = 'ignore'
}
withLabel:error_retry {
errorStrategy = 'retry'
maxRetries = 2
}
withName:CUSTOM_DUMPSOFTWAREVERSIONS {
cache = false
}
}

@ -0,0 +1,432 @@
/*
========================================================================================
Nextflow config file for iGenomes paths
========================================================================================
Defines reference genomes using iGenome paths.
Can be used by any config that customises the base path using:
$params.igenomes_base / --igenomes_base
----------------------------------------------------------------------------------------
*/
params {
// illumina iGenomes reference file paths
genomes {
'GRCh37' {
fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt"
mito_name = "MT"
macs_gsize = "2.7e9"
blacklist = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed"
}
'GRCh38' {
fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed"
mito_name = "chrM"
macs_gsize = "2.7e9"
blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed"
}
'GRCm38' {
fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt"
mito_name = "MT"
macs_gsize = "1.87e9"
blacklist = "${projectDir}/assets/blacklists/GRCm38-blacklist.bed"
}
'TAIR10' {
fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt"
mito_name = "Mt"
}
'EB2' {
fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt"
}
'UMD3.1' {
fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt"
mito_name = "MT"
}
'WBcel235' {
fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed"
mito_name = "MtDNA"
macs_gsize = "9e7"
}
'CanFam3.1' {
fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt"
mito_name = "MT"
}
'GRCz10' {
fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed"
mito_name = "MT"
}
'BDGP6' {
fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed"
mito_name = "M"
macs_gsize = "1.2e8"
}
'EquCab2' {
fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt"
mito_name = "MT"
}
'EB1' {
fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt"
}
'Galgal4' {
fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed"
mito_name = "MT"
}
'Gm01' {
fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt"
}
'Mmul_1' {
fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt"
mito_name = "MT"
}
'IRGSP-1.0' {
fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed"
mito_name = "Mt"
}
'CHIMP2.1.4' {
fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt"
mito_name = "MT"
}
'Rnor_5.0' {
fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.bed"
mito_name = "MT"
}
'Rnor_6.0' {
fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed"
mito_name = "MT"
}
'R64-1-1' {
fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed"
mito_name = "MT"
macs_gsize = "1.2e7"
}
'EF2' {
fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt"
mito_name = "MT"
macs_gsize = "1.21e7"
}
'Sbi1' {
fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt"
}
'Sscrofa10.2' {
fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt"
mito_name = "MT"
}
'AGPv3' {
fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed"
mito_name = "Mt"
}
'hg38' {
fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed"
mito_name = "chrM"
macs_gsize = "2.7e9"
blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed"
}
'hg19' {
fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt"
mito_name = "chrM"
macs_gsize = "2.7e9"
blacklist = "${projectDir}/assets/blacklists/hg19-blacklist.bed"
}
'mm10' {
fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt"
mito_name = "chrM"
macs_gsize = "1.87e9"
blacklist = "${projectDir}/assets/blacklists/mm10-blacklist.bed"
}
'bosTau8' {
fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed"
mito_name = "chrM"
}
'ce10' {
fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt"
mito_name = "chrM"
macs_gsize = "9e7"
}
'canFam3' {
fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt"
mito_name = "chrM"
}
'danRer10' {
fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed"
mito_name = "chrM"
macs_gsize = "1.37e9"
}
'dm6' {
fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed"
mito_name = "chrM"
macs_gsize = "1.2e8"
}
'equCab2' {
fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt"
mito_name = "chrM"
}
'galGal4' {
fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt"
mito_name = "chrM"
}
'panTro4' {
fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt"
mito_name = "chrM"
}
'rn6' {
fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed"
mito_name = "chrM"
}
'sacCer3' {
fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/"
readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt"
mito_name = "chrM"
macs_gsize = "1.2e7"
}
'susScr3' {
fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa"
bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa"
bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/"
star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/"
bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/"
gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf"
bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed"
readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt"
mito_name = "chrM"
}
}
}

@ -0,0 +1,41 @@
/*
========================================================================================
Config file for defining DSL2 per module options and publishing paths
========================================================================================
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
----------------------------------------------------------------------------------------
*/
process {
publishDir = [
path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" },
mode: 'copy',
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
withName: SAMPLESHEET_CHECK {
publishDir = [
path: { "${params.outdir}/pipeline_info" },
mode: 'copy',
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
withName: FASTQC {
ext.args = '--quiet'
}
withName: CUSTOM_DUMPSOFTWAREVERSIONS {
publishDir = [
path: { "${params.outdir}/pipeline_info" },
mode: 'copy',
pattern: '*_versions.yml'
]
}
}

@ -0,0 +1,29 @@
/*
========================================================================================
Nextflow config file for running minimal tests
========================================================================================
Defines input files and everything required to run a fast and simple pipeline test.
Use as follows:
nextflow run nf-core/taxprofiler -profile test,<docker/singularity>
----------------------------------------------------------------------------------------
*/
params {
config_profile_name = 'Test profile'
config_profile_description = 'Minimal test dataset to check pipeline function'
// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = '6.GB'
max_time = '6.h'
// Input data
// TODO nf-core: Specify the paths to your test data on nf-core/test-datasets
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv'
// Genome references
genome = 'R64-1-1'
}

@ -0,0 +1,24 @@
/*
========================================================================================
Nextflow config file for running full-size tests
========================================================================================
Defines input files and everything required to run a full size pipeline test.
Use as follows:
nextflow run nf-core/taxprofiler -profile test_full,<docker/singularity>
----------------------------------------------------------------------------------------
*/
params {
config_profile_name = 'Full test profile'
config_profile_description = 'Full test dataset to check pipeline function'
// Input data for full size test
// TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
// TODO nf-core: Give any required params for the test so that command line flags are not needed
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv'
// Genome references
genome = 'R64-1-1'
}

@ -0,0 +1,10 @@
# nf-core/taxprofiler: Documentation
The nf-core/taxprofiler documentation is split into the following pages:
* [Usage](usage.md)
* An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.
* [Output](output.md)
* An overview of the different results produced by the pipeline and how to interpret them.
You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re)

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

@ -0,0 +1,68 @@
# nf-core/taxprofiler: Output
## Introduction
This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
<!-- TODO nf-core: Write this documentation describing your workflow's output -->
## Pipeline overview
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
* [FastQC](#fastqc) - Raw read QC
* [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
* [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
### FastQC
<details markdown="1">
<summary>Output files</summary>
* `fastqc/`
* `*_fastqc.html`: FastQC report containing quality metrics.
* `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images.
</details>
[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)
![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
> **NB:** The FastQC plots displayed in the MultiQC report shows _untrimmed_ reads. They may contain adapter sequence and potentially regions with low quality.
### MultiQC
<details markdown="1">
<summary>Output files</summary>
* `multiqc/`
* `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
* `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
* `multiqc_plots/`: directory containing static images from the report in various formats.
</details>
[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see <http://multiqc.info>.
### Pipeline information
<details markdown="1">
<summary>Output files</summary>
* `pipeline_info/`
* Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`.
* Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline.
* Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`.
</details>
[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

@ -0,0 +1,248 @@
# nf-core/taxprofiler: Usage
## :warning: Please read this documentation on the nf-core website: [https://nf-co.re/taxprofiler/usage](https://nf-co.re/taxprofiler/usage)
> _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._
## Introduction
<!-- TODO nf-core: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website. -->
## Samplesheet input
You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below.
```console
--input '[path to samplesheet file]'
```
### Multiple runs of the same sample
The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes:
```console
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz
```
### Full samplesheet
The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below.
A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice.
```console
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz
CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz
TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz,
TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz,
TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz,
TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz,
```
| Column | Description |
|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.
## Running the pipeline
The typical command for running the pipeline is as follows:
```console
nextflow run nf-core/taxprofiler --input samplesheet.csv --genome GRCh37 -profile docker
```
This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
Note that the pipeline will create the following files in your working directory:
```console
work # Directory containing the nextflow working files
results # Finished results (configurable, see below)
.nextflow_log # Log file from Nextflow
# Other nextflow hidden files, eg. history of pipeline runs and old logs.
```
### Updating the pipeline
When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
```console
nextflow pull nf-core/taxprofiler
```
### Reproducibility
It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since.
First, go to the [nf-core/taxprofiler releases page](https://github.com/nf-core/taxprofiler/releases) and find the latest version number - numeric only (eg. `1.3.1`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.3.1`.
This version number will be logged in reports when you run the pipeline, so that you'll know what you used when you look back in the future.
## Core Nextflow arguments
> **NB:** These options are part of Nextflow and use a _single_ hyphen (pipeline parameters use a double-hyphen).
### `-profile`
Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments.
Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below. When using Biocontainers, most of these software packaging methods pull Docker containers from quay.io e.g [FastQC](https://quay.io/repository/biocontainers/fastqc) except for Singularity which directly downloads Singularity images via https hosted by the [Galaxy project](https://depot.galaxyproject.org/singularity/) and Conda which downloads and installs software locally from [Bioconda](https://bioconda.github.io/).
> We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported.
The pipeline also dynamically loads configurations from [https://github.com/nf-core/configs](https://github.com/nf-core/configs) when it runs, making multiple config profiles for various institutional clusters available at run time. For more information and to see if your system is available in these configs please see the [nf-core/configs documentation](https://github.com/nf-core/configs#documentation).
Note that multiple profiles can be loaded, for example: `-profile test,docker` - the order of arguments is important!
They are loaded in sequence, so later profiles can overwrite earlier profiles.
If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended.
* `docker`
* A generic configuration profile to be used with [Docker](https://docker.com/)
* `singularity`
* A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/)
* `podman`
* A generic configuration profile to be used with [Podman](https://podman.io/)
* `shifter`
* A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/)
* `charliecloud`
* A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/)
* `conda`
* A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud.
* `test`
* A profile with a complete configuration for automated testing
* Includes links to test data so needs no other parameters
### `-resume`
Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously.
You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names.
### `-c`
Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information.
## Custom configuration
### Resource requests
Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped.
For example, if the nf-core/rnaseq pipeline is failing after multiple re-submissions of the `STAR_ALIGN` process due to an exit code of `137` this would indicate that there is an out of memory issue:
```console
[62/149eb0] NOTE: Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -- Execution is retried (1)
Error executing process > 'RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)'
Caused by:
Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137)
Command executed:
STAR \
--genomeDir star \
--readFilesIn WT_REP1_trimmed.fq.gz \
--runThreadN 2 \
--outFileNamePrefix WT_REP1. \
<TRUNCATED>
Command exit status:
137
Command output:
(empty)
Command error:
.command.sh: line 9: 30 Killed STAR --genomeDir star --readFilesIn WT_REP1_trimmed.fq.gz --runThreadN 2 --outFileNamePrefix WT_REP1. <TRUNCATED>
Work dir:
/home/pipelinetest/work/9d/172ca5881234073e8d76f2a19c88fb
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
```
To bypass this error you would need to find exactly which resources are set by the `STAR_ALIGN` process. The quickest way is to search for `process STAR_ALIGN` in the [nf-core/rnaseq Github repo](https://github.com/nf-core/rnaseq/search?q=process+STAR_ALIGN). We have standardised the structure of Nextflow DSL2 pipelines such that all module files will be present in the `modules/` directory and so based on the search results the file we want is `modules/nf-core/software/star/align/main.nf`. If you click on the link to that file you will notice that there is a `label` directive at the top of the module that is set to [`label process_high`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L9). The [Nextflow `label`](https://www.nextflow.io/docs/latest/process.html#label) directive allows us to organise workflow processes in separate groups which can be referenced in a configuration file to select and configure subset of processes having similar computing requirements. The default values for the `process_high` label are set in the pipeline's [`base.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L33-L37) which in this case is defined as 72GB. Providing you haven't set any other standard nf-core parameters to __cap__ the [maximum resources](https://nf-co.re/usage/configuration#max-resources) used by the pipeline then we can try and bypass the `STAR_ALIGN` process failure by creating a custom config file that sets at least 72GB of memory, in this case increased to 100GB. The custom config below can then be provided to the pipeline via the [`-c`](#-c) parameter as highlighted in previous sections.
```nextflow
process {
withName: STAR_ALIGN {
memory = 100.GB
}
}
```
> **NB:** We specify just the process name i.e. `STAR_ALIGN` in the config file and not the full task name string that is printed to screen in the error message or on the terminal whilst the pipeline is running i.e. `RNASEQ:ALIGN_STAR:STAR_ALIGN`. You may get a warning suggesting that the process selector isn't recognised but you can ignore that if the process name has been specified correctly. This is something that needs to be fixed upstream in core Nextflow.
### Updating containers
The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon everytime a new version of Pangolin has been released. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`.
1. Check the default version used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19)
2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags)
3. Create the custom config accordingly:
* For Docker:
```nextflow
process {
withName: PANGOLIN {
container = 'quay.io/biocontainers/pangolin:3.0.5--pyhdfd78af_0'
}
}
```
* For Singularity:
```nextflow
process {
withName: PANGOLIN {
container = 'https://depot.galaxyproject.org/singularity/pangolin:3.0.5--pyhdfd78af_0'
}
}
```
* For Conda:
```nextflow
process {
withName: PANGOLIN {
conda = 'bioconda::pangolin=3.0.5'
}
}
```
> **NB:** If you wish to periodically update individual tool-specific results (e.g. Pangolin) generated by the pipeline then you must ensure to keep the `work/` directory otherwise the `-resume` ability of the pipeline will be compromised and it will restart from scratch.
### nf-core/configs
In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile.
See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information about creating your own configuration files.
If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs).
## Running in the background
Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished.
The Nextflow `-bg` flag launches Nextflow in the background, detached from your terminal so that the workflow does not stop if you log out of your session. The logs are saved to a file.
Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time.
Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs).
## Nextflow memory requirements
In some cases, the Nextflow Java virtual machines can start to request a large amount of memory.
We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`):
```console
NXF_OPTS='-Xms1g -Xmx4g'
```

@ -0,0 +1,529 @@
//
// This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf-core pipeline template.
//
import org.everit.json.schema.Schema
import org.everit.json.schema.loader.SchemaLoader
import org.everit.json.schema.ValidationException
import org.json.JSONObject
import org.json.JSONTokener
import org.json.JSONArray
import groovy.json.JsonSlurper
import groovy.json.JsonBuilder
class NfcoreSchema {
//
// Resolve Schema path relative to main workflow directory
//
public static String getSchemaPath(workflow, schema_filename='nextflow_schema.json') {
return "${workflow.projectDir}/${schema_filename}"
}
//
// Function to loop over all parameters defined in schema and check
// whether the given parameters adhere to the specifications
//
/* groovylint-disable-next-line UnusedPrivateMethodParameter */
public static void validateParameters(workflow, params, log, schema_filename='nextflow_schema.json') {
def has_error = false
//=====================================================================//
// Check for nextflow core params and unexpected params
def json = new File(getSchemaPath(workflow, schema_filename=schema_filename)).text
def Map schemaParams = (Map) new JsonSlurper().parseText(json).get('definitions')
def nf_params = [
// Options for base `nextflow` command
'bg',
'c',
'C',
'config',
'd',
'D',
'dockerize',
'h',
'log',
'q',
'quiet',
'syslog',
'v',
'version',
// Options for `nextflow run` command
'ansi',
'ansi-log',
'bg',
'bucket-dir',
'c',
'cache',
'config',
'dsl2',
'dump-channels',
'dump-hashes',
'E',
'entry',
'latest',
'lib',
'main-script',
'N',
'name',
'offline',
'params-file',
'pi',
'plugins',
'poll-interval',
'pool-size',
'profile',
'ps',
'qs',
'queue-size',
'r',
'resume',
'revision',
'stdin',
'stub',
'stub-run',
'test',
'w',
'with-charliecloud',
'with-conda',
'with-dag',
'with-docker',
'with-mpi',
'with-notification',
'with-podman',
'with-report',
'with-singularity',
'with-timeline',
'with-tower',
'with-trace',
'with-weblog',
'without-docker',
'without-podman',
'work-dir'
]
def unexpectedParams = []
// Collect expected parameters from the schema
def expectedParams = []
def enums = [:]
for (group in schemaParams) {
for (p in group.value['properties']) {
expectedParams.push(p.key)
if (group.value['properties'][p.key].containsKey('enum')) {
enums[p.key] = group.value['properties'][p.key]['enum']
}
}
}
for (specifiedParam in params.keySet()) {
// nextflow params
if (nf_params.contains(specifiedParam)) {
log.error "ERROR: You used a core Nextflow option with two hyphens: '--${specifiedParam}'. Please resubmit with '-${specifiedParam}'"
has_error = true
}
// unexpected params
def params_ignore = params.schema_ignore_params.split(',') + 'schema_ignore_params'
def expectedParamsLowerCase = expectedParams.collect{ it.replace("-", "").toLowerCase() }
def specifiedParamLowerCase = specifiedParam.replace("-", "").toLowerCase()
def isCamelCaseBug = (specifiedParam.contains("-") && !expectedParams.contains(specifiedParam) && expectedParamsLowerCase.contains(specifiedParamLowerCase))
if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !isCamelCaseBug) {
// Temporarily remove camelCase/camel-case params #1035
def unexpectedParamsLowerCase = unexpectedParams.collect{ it.replace("-", "").toLowerCase()}
if (!unexpectedParamsLowerCase.contains(specifiedParamLowerCase)){
unexpectedParams.push(specifiedParam)
}
}
}
//=====================================================================//
// Validate parameters against the schema
InputStream input_stream = new File(getSchemaPath(workflow, schema_filename=schema_filename)).newInputStream()
JSONObject raw_schema = new JSONObject(new JSONTokener(input_stream))
// Remove anything that's in params.schema_ignore_params
raw_schema = removeIgnoredParams(raw_schema, params)
Schema schema = SchemaLoader.load(raw_schema)
// Clean the parameters
def cleanedParams = cleanParameters(params)
// Convert to JSONObject
def jsonParams = new JsonBuilder(cleanedParams)
JSONObject params_json = new JSONObject(jsonParams.toString())
// Validate
try {
schema.validate(params_json)
} catch (ValidationException e) {
println ''
log.error 'ERROR: Validation of pipeline parameters failed!'
JSONObject exceptionJSON = e.toJSON()
printExceptions(exceptionJSON, params_json, log, enums)
println ''
has_error = true
}
// Check for unexpected parameters
if (unexpectedParams.size() > 0) {
Map colors = NfcoreTemplate.logColours(params.monochrome_logs)
println ''
def warn_msg = 'Found unexpected parameters:'
for (unexpectedParam in unexpectedParams) {
warn_msg = warn_msg + "\n* --${unexpectedParam}: ${params[unexpectedParam].toString()}"
}
log.warn warn_msg
log.info "- ${colors.dim}Ignore this warning: params.schema_ignore_params = \"${unexpectedParams.join(',')}\" ${colors.reset}"
println ''
}
if (has_error) {
System.exit(1)
}
}
//
// Beautify parameters for --help
//
public static String paramsHelp(workflow, params, command, schema_filename='nextflow_schema.json') {
Map colors = NfcoreTemplate.logColours(params.monochrome_logs)
Integer num_hidden = 0
String output = ''
output += 'Typical pipeline command:\n\n'
output += " ${colors.cyan}${command}${colors.reset}\n\n"
Map params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename))
Integer max_chars = paramsMaxChars(params_map) + 1
Integer desc_indent = max_chars + 14
Integer dec_linewidth = 160 - desc_indent
for (group in params_map.keySet()) {
Integer num_params = 0
String group_output = colors.underlined + colors.bold + group + colors.reset + '\n'
def group_params = params_map.get(group) // This gets the parameters of that particular group
for (param in group_params.keySet()) {
if (group_params.get(param).hidden && !params.show_hidden_params) {
num_hidden += 1
continue;
}
def type = '[' + group_params.get(param).type + ']'
def description = group_params.get(param).description
def defaultValue = group_params.get(param).default != null ? " [default: " + group_params.get(param).default.toString() + "]" : ''
def description_default = description + colors.dim + defaultValue + colors.reset
// Wrap long description texts
// Loosely based on https://dzone.com/articles/groovy-plain-text-word-wrap
if (description_default.length() > dec_linewidth){
List olines = []
String oline = "" // " " * indent
description_default.split(" ").each() { wrd ->
if ((oline.size() + wrd.size()) <= dec_linewidth) {
oline += wrd + " "
} else {
olines += oline
oline = wrd + " "
}
}
olines += oline
description_default = olines.join("\n" + " " * desc_indent)
}
group_output += " --" + param.padRight(max_chars) + colors.dim + type.padRight(10) + colors.reset + description_default + '\n'
num_params += 1
}
group_output += '\n'
if (num_params > 0){
output += group_output
}
}
if (num_hidden > 0){
output += colors.dim + "!! Hiding $num_hidden params, use --show_hidden_params to show them !!\n" + colors.reset
}
output += NfcoreTemplate.dashedLine(params.monochrome_logs)
return output
}
//
// Groovy Map summarising parameters/workflow options used by the pipeline
//
public static LinkedHashMap paramsSummaryMap(workflow, params, schema_filename='nextflow_schema.json') {
// Get a selection of core Nextflow workflow options
def Map workflow_summary = [:]
if (workflow.revision) {
workflow_summary['revision'] = workflow.revision
}
workflow_summary['runName'] = workflow.runName
if (workflow.containerEngine) {
workflow_summary['containerEngine'] = workflow.containerEngine
}
if (workflow.container) {
workflow_summary['container'] = workflow.container
}
workflow_summary['launchDir'] = workflow.launchDir
workflow_summary['workDir'] = workflow.workDir
workflow_summary['projectDir'] = workflow.projectDir
workflow_summary['userName'] = workflow.userName
workflow_summary['profile'] = workflow.profile
workflow_summary['configFiles'] = workflow.configFiles.join(', ')
// Get pipeline parameters defined in JSON Schema
def Map params_summary = [:]
def params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename))
for (group in params_map.keySet()) {
def sub_params = new LinkedHashMap()
def group_params = params_map.get(group) // This gets the parameters of that particular group
for (param in group_params.keySet()) {
if (params.containsKey(param)) {
def params_value = params.get(param)
def schema_value = group_params.get(param).default
def param_type = group_params.get(param).type
if (schema_value != null) {
if (param_type == 'string') {
if (schema_value.contains('$projectDir') || schema_value.contains('${projectDir}')) {
def sub_string = schema_value.replace('\$projectDir', '')
sub_string = sub_string.replace('\${projectDir}', '')
if (params_value.contains(sub_string)) {
schema_value = params_value
}
}
if (schema_value.contains('$params.outdir') || schema_value.contains('${params.outdir}')) {
def sub_string = schema_value.replace('\$params.outdir', '')
sub_string = sub_string.replace('\${params.outdir}', '')
if ("${params.outdir}${sub_string}" == params_value) {
schema_value = params_value
}
}
}
}
// We have a default in the schema, and this isn't it
if (schema_value != null && params_value != schema_value) {
sub_params.put(param, params_value)
}
// No default in the schema, and this isn't empty
else if (schema_value == null && params_value != "" && params_value != null && params_value != false) {
sub_params.put(param, params_value)
}
}
}
params_summary.put(group, sub_params)
}
return [ 'Core Nextflow options' : workflow_summary ] << params_summary
}
//
// Beautify parameters for summary and return as string
//
public static String paramsSummaryLog(workflow, params) {
Map colors = NfcoreTemplate.logColours(params.monochrome_logs)
String output = ''
def params_map = paramsSummaryMap(workflow, params)
def max_chars = paramsMaxChars(params_map)
for (group in params_map.keySet()) {
def group_params = params_map.get(group) // This gets the parameters of that particular group
if (group_params) {
output += colors.bold + group + colors.reset + '\n'
for (param in group_params.keySet()) {
output += " " + colors.blue + param.padRight(max_chars) + ": " + colors.green + group_params.get(param) + colors.reset + '\n'
}
output += '\n'
}
}
output += "!! Only displaying parameters that differ from the pipeline defaults !!\n"
output += NfcoreTemplate.dashedLine(params.monochrome_logs)
return output
}
//
// Loop over nested exceptions and print the causingException
//
private static void printExceptions(ex_json, params_json, log, enums, limit=5) {
def causingExceptions = ex_json['causingExceptions']
if (causingExceptions.length() == 0) {
def m = ex_json['message'] =~ /required key \[([^\]]+)\] not found/
// Missing required param
if (m.matches()) {
log.error "* Missing required parameter: --${m[0][1]}"
}
// Other base-level error
else if (ex_json['pointerToViolation'] == '#') {
log.error "* ${ex_json['message']}"
}
// Error with specific param
else {
def param = ex_json['pointerToViolation'] - ~/^#\//
def param_val = params_json[param].toString()
if (enums.containsKey(param)) {
def error_msg = "* --${param}: '${param_val}' is not a valid choice (Available choices"
if (enums[param].size() > limit) {
log.error "${error_msg} (${limit} of ${enums[param].size()}): ${enums[param][0..limit-1].join(', ')}, ... )"
} else {
log.error "${error_msg}: ${enums[param].join(', ')})"
}
} else {
log.error "* --${param}: ${ex_json['message']} (${param_val})"
}
}
}
for (ex in causingExceptions) {
printExceptions(ex, params_json, log, enums)
}
}
//
// Remove an element from a JSONArray
//
private static JSONArray removeElement(json_array, element) {
def list = []
int len = json_array.length()
for (int i=0;i<len;i++){
list.add(json_array.get(i).toString())
}
list.remove(element)
JSONArray jsArray = new JSONArray(list)
return jsArray
}
//
// Remove ignored parameters
//
private static JSONObject removeIgnoredParams(raw_schema, params) {
// Remove anything that's in params.schema_ignore_params
params.schema_ignore_params.split(',').each{ ignore_param ->
if(raw_schema.keySet().contains('definitions')){
raw_schema.definitions.each { definition ->
for (key in definition.keySet()){
if (definition[key].get("properties").keySet().contains(ignore_param)){
// Remove the param to ignore
definition[key].get("properties").remove(ignore_param)
// If the param was required, change this
if (definition[key].has("required")) {
def cleaned_required = removeElement(definition[key].required, ignore_param)
definition[key].put("required", cleaned_required)
}
}
}
}
}
if(raw_schema.keySet().contains('properties') && raw_schema.get('properties').keySet().contains(ignore_param)) {
raw_schema.get("properties").remove(ignore_param)
}
if(raw_schema.keySet().contains('required') && raw_schema.required.contains(ignore_param)) {
def cleaned_required = removeElement(raw_schema.required, ignore_param)
raw_schema.put("required", cleaned_required)
}
}
return raw_schema
}
//
// Clean and check parameters relative to Nextflow native classes
//
private static Map cleanParameters(params) {
def new_params = params.getClass().newInstance(params)
for (p in params) {
// remove anything evaluating to false
if (!p['value']) {
new_params.remove(p.key)
}
// Cast MemoryUnit to String
if (p['value'].getClass() == nextflow.util.MemoryUnit) {
new_params.replace(p.key, p['value'].toString())
}
// Cast Duration to String
if (p['value'].getClass() == nextflow.util.Duration) {
new_params.replace(p.key, p['value'].toString().replaceFirst(/d(?!\S)/, "day"))
}
// Cast LinkedHashMap to String
if (p['value'].getClass() == LinkedHashMap) {
new_params.replace(p.key, p['value'].toString())
}
}
return new_params
}
//
// This function tries to read a JSON params file
//
private static LinkedHashMap paramsLoad(String json_schema) {
def params_map = new LinkedHashMap()
try {
params_map = paramsRead(json_schema)
} catch (Exception e) {
println "Could not read parameters settings from JSON. $e"
params_map = new LinkedHashMap()
}
return params_map
}
//
// Method to actually read in JSON file using Groovy.
// Group (as Key), values are all parameters
// - Parameter1 as Key, Description as Value
// - Parameter2 as Key, Description as Value
// ....
// Group
// -
private static LinkedHashMap paramsRead(String json_schema) throws Exception {
def json = new File(json_schema).text
def Map schema_definitions = (Map) new JsonSlurper().parseText(json).get('definitions')
def Map schema_properties = (Map) new JsonSlurper().parseText(json).get('properties')
/* Tree looks like this in nf-core schema
* definitions <- this is what the first get('definitions') gets us
group 1
title
description
properties
parameter 1
type
description
parameter 2
type
description
group 2
title
description
properties
parameter 1
type
description
* properties <- parameters can also be ungrouped, outside of definitions
parameter 1
type
description
*/
// Grouped params
def params_map = new LinkedHashMap()
schema_definitions.each { key, val ->
def Map group = schema_definitions."$key".properties // Gets the property object of the group
def title = schema_definitions."$key".title
def sub_params = new LinkedHashMap()
group.each { innerkey, value ->
sub_params.put(innerkey, value)
}
params_map.put(title, sub_params)
}
// Ungrouped params
def ungrouped_params = new LinkedHashMap()
schema_properties.each { innerkey, value ->
ungrouped_params.put(innerkey, value)
}
params_map.put("Other parameters", ungrouped_params)
return params_map
}
//
// Get maximum number of characters across all parameter names
//
private static Integer paramsMaxChars(params_map) {
Integer max_chars = 0
for (group in params_map.keySet()) {
def group_params = params_map.get(group) // This gets the parameters of that particular group
for (param in group_params.keySet()) {
if (param.size() > max_chars) {
max_chars = param.size()
}
}
}
return max_chars
}
}

@ -0,0 +1,258 @@
//
// This file holds several functions used within the nf-core pipeline template.
//
import org.yaml.snakeyaml.Yaml
class NfcoreTemplate {
//
// Check AWS Batch related parameters have been specified correctly
//
public static void awsBatch(workflow, params) {
if (workflow.profile.contains('awsbatch')) {
// Check params.awsqueue and params.awsregion have been set if running on AWSBatch
assert (params.awsqueue && params.awsregion) : "Specify correct --awsqueue and --awsregion parameters on AWSBatch!"
// Check outdir paths to be S3 buckets if running on AWSBatch
assert params.outdir.startsWith('s3:') : "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!"
}
}
//
// Warn if a -profile or Nextflow config has not been provided to run the pipeline
//
public static void checkConfigProvided(workflow, log) {
if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) {
log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" +
"This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" +
" (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" +
" (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" +
" (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" +
"Please refer to the quick start section and usage docs for the pipeline.\n "
}
}
//
// Construct and send completion email
//
public static void email(workflow, params, summary_params, projectDir, log, multiqc_report=[]) {
// Set up the e-mail variables
def subject = "[$workflow.manifest.name] Successful: $workflow.runName"
if (!workflow.success) {
subject = "[$workflow.manifest.name] FAILED: $workflow.runName"
}
def summary = [:]
for (group in summary_params.keySet()) {
summary << summary_params[group]
}
def misc_fields = [:]
misc_fields['Date Started'] = workflow.start
misc_fields['Date Completed'] = workflow.complete
misc_fields['Pipeline script file path'] = workflow.scriptFile
misc_fields['Pipeline script hash ID'] = workflow.scriptId
if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository
if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId
if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision
misc_fields['Nextflow Version'] = workflow.nextflow.version
misc_fields['Nextflow Build'] = workflow.nextflow.build
misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp
def email_fields = [:]
email_fields['version'] = workflow.manifest.version
email_fields['runName'] = workflow.runName
email_fields['success'] = workflow.success
email_fields['dateComplete'] = workflow.complete
email_fields['duration'] = workflow.duration
email_fields['exitStatus'] = workflow.exitStatus
email_fields['errorMessage'] = (workflow.errorMessage ?: 'None')
email_fields['errorReport'] = (workflow.errorReport ?: 'None')
email_fields['commandLine'] = workflow.commandLine
email_fields['projectDir'] = workflow.projectDir
email_fields['summary'] = summary << misc_fields
// On success try attach the multiqc report
def mqc_report = null
try {
if (workflow.success) {
mqc_report = multiqc_report.getVal()
if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) {
if (mqc_report.size() > 1) {
log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one"
}
mqc_report = mqc_report[0]
}
}
} catch (all) {
if (multiqc_report) {
log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email"
}
}
// Check if we are only sending emails on failure
def email_address = params.email
if (!params.email && params.email_on_fail && !workflow.success) {
email_address = params.email_on_fail
}
// Render the TXT template
def engine = new groovy.text.GStringTemplateEngine()
def tf = new File("$projectDir/assets/email_template.txt")
def txt_template = engine.createTemplate(tf).make(email_fields)
def email_txt = txt_template.toString()
// Render the HTML template
def hf = new File("$projectDir/assets/email_template.html")
def html_template = engine.createTemplate(hf).make(email_fields)
def email_html = html_template.toString()
// Render the sendmail template
def max_multiqc_email_size = params.max_multiqc_email_size as nextflow.util.MemoryUnit
def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "$projectDir", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ]
def sf = new File("$projectDir/assets/sendmail_template.txt")
def sendmail_template = engine.createTemplate(sf).make(smail_fields)
def sendmail_html = sendmail_template.toString()
// Send the HTML e-mail
Map colors = logColours(params.monochrome_logs)
if (email_address) {
try {
if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') }
// Try to send HTML e-mail using sendmail
[ 'sendmail', '-t' ].execute() << sendmail_html
log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-"
} catch (all) {
// Catch failures and try with plaintext
def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ]
if ( mqc_report.size() <= max_multiqc_email_size.toBytes() ) {
mail_cmd += [ '-A', mqc_report ]
}
mail_cmd.execute() << email_html
log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-"
}
}
// Write summary e-mail HTML to a file
def output_d = new File("${params.outdir}/pipeline_info/")
if (!output_d.exists()) {
output_d.mkdirs()
}
def output_hf = new File(output_d, "pipeline_report.html")
output_hf.withWriter { w -> w << email_html }
def output_tf = new File(output_d, "pipeline_report.txt")
output_tf.withWriter { w -> w << email_txt }
}
//
// Print pipeline summary on completion
//
public static void summary(workflow, params, log) {
Map colors = logColours(params.monochrome_logs)
if (workflow.success) {
if (workflow.stats.ignoredCount == 0) {
log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-"
} else {
log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed successfully, but with errored process(es) ${colors.reset}-"
}
} else {
log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-"
}
}
//
// ANSII Colours used for terminal logging
//
public static Map logColours(Boolean monochrome_logs) {
Map colorcodes = [:]
// Reset / Meta
colorcodes['reset'] = monochrome_logs ? '' : "\033[0m"
colorcodes['bold'] = monochrome_logs ? '' : "\033[1m"
colorcodes['dim'] = monochrome_logs ? '' : "\033[2m"
colorcodes['underlined'] = monochrome_logs ? '' : "\033[4m"
colorcodes['blink'] = monochrome_logs ? '' : "\033[5m"
colorcodes['reverse'] = monochrome_logs ? '' : "\033[7m"
colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m"
// Regular Colors
colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m"
colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m"
colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m"
colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m"
colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m"
colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m"
colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m"
colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m"
// Bold
colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m"
colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m"
colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m"
colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m"
colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m"
colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m"
colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m"
colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m"
// Underline
colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m"
colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m"
colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m"
colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m"
colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m"
colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m"
colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m"
colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m"
// High Intensity
colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m"
colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m"
colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m"
colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m"
colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m"
colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m"
colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m"
colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m"
// Bold High Intensity
colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m"
colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m"
colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m"
colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m"
colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m"
colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m"
colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m"
colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m"
return colorcodes
}
//
// Does what is says on the tin
//
public static String dashedLine(monochrome_logs) {
Map colors = logColours(monochrome_logs)
return "-${colors.dim}----------------------------------------------------${colors.reset}-"
}
//
// nf-core logo
//
public static String logo(workflow, monochrome_logs) {
Map colors = logColours(monochrome_logs)
String.format(
"""\n
${dashedLine(monochrome_logs)}
${colors.green},--.${colors.black}/${colors.green},-.${colors.reset}
${colors.blue} ___ __ __ __ ___ ${colors.green}/,-._.--~\'${colors.reset}
${colors.blue} |\\ | |__ __ / ` / \\ |__) |__ ${colors.yellow}} {${colors.reset}
${colors.blue} | \\| | \\__, \\__/ | \\ |___ ${colors.green}\\`-._,-`-,${colors.reset}
${colors.green}`._,._,\'${colors.reset}
${colors.purple} ${workflow.manifest.name} v${workflow.manifest.version}${colors.reset}
${dashedLine(monochrome_logs)}
""".stripIndent()
)
}
}

@ -0,0 +1,40 @@
//
// This file holds several Groovy functions that could be useful for any Nextflow pipeline
//
import org.yaml.snakeyaml.Yaml
class Utils {
//
// When running with -profile conda, warn if channels have not been set-up appropriately
//
public static void checkCondaChannels(log) {
Yaml parser = new Yaml()
def channels = []
try {
def config = parser.load("conda config --show channels".execute().text)
channels = config.channels
} catch(NullPointerException | IOException e) {
log.warn "Could not verify conda channel configuration."
return
}
// Check that all channels are present
def required_channels = ['conda-forge', 'bioconda', 'defaults']
def conda_check_failed = !required_channels.every { ch -> ch in channels }
// Check that they are in the right order
conda_check_failed |= !(channels.indexOf('conda-forge') < channels.indexOf('bioconda'))
conda_check_failed |= !(channels.indexOf('bioconda') < channels.indexOf('defaults'))
if (conda_check_failed) {
log.warn "=============================================================================\n" +
" There is a problem with your Conda configuration!\n\n" +
" You will need to set-up the conda-forge and bioconda channels correctly.\n" +
" Please refer to https://bioconda.github.io/user/install.html#set-up-channels\n" +
" NB: The order of the channels matters!\n" +
"==================================================================================="
}
}
}

@ -0,0 +1,94 @@
//
// This file holds several functions specific to the main.nf workflow in the nf-core/taxprofiler pipeline
//
class WorkflowMain {
//
// Citation string for pipeline
//
public static String citation(workflow) {
return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" +
// TODO nf-core: Add Zenodo DOI for pipeline after first release
//"* The pipeline\n" +
//" https://doi.org/10.5281/zenodo.XXXXXXX\n\n" +
"* The nf-core framework\n" +
" https://doi.org/10.1038/s41587-020-0439-x\n\n" +
"* Software dependencies\n" +
" https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md"
}
//
// Print help to screen if required
//
public static String help(workflow, params, log) {
def command = "nextflow run ${workflow.manifest.name} --input samplesheet.csv --genome GRCh37 -profile docker"
def help_string = ''
help_string += NfcoreTemplate.logo(workflow, params.monochrome_logs)
help_string += NfcoreSchema.paramsHelp(workflow, params, command)
help_string += '\n' + citation(workflow) + '\n'
help_string += NfcoreTemplate.dashedLine(params.monochrome_logs)
return help_string
}
//
// Print parameter summary log to screen
//
public static String paramsSummaryLog(workflow, params, log) {
def summary_log = ''
summary_log += NfcoreTemplate.logo(workflow, params.monochrome_logs)
summary_log += NfcoreSchema.paramsSummaryLog(workflow, params)
summary_log += '\n' + citation(workflow) + '\n'
summary_log += NfcoreTemplate.dashedLine(params.monochrome_logs)
return summary_log
}
//
// Validate parameters and print summary to screen
//
public static void initialise(workflow, params, log) {
// Print help to screen if required
if (params.help) {
log.info help(workflow, params, log)
System.exit(0)
}
// Validate workflow parameters via the JSON schema
if (params.validate_params) {
NfcoreSchema.validateParameters(workflow, params, log)
}
// Print parameter summary log to screen
log.info paramsSummaryLog(workflow, params, log)
// Check that a -profile or Nextflow config has been provided to run the pipeline
NfcoreTemplate.checkConfigProvided(workflow, log)
// Check that conda channels are set-up correctly
if (params.enable_conda) {
Utils.checkCondaChannels(log)
}
// Check AWS batch settings
NfcoreTemplate.awsBatch(workflow, params)
// Check input has been provided
if (!params.input) {
log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.csv'"
System.exit(1)
}
}
//
// Get attribute from genome config file e.g. fasta
//
public static String getGenomeAttribute(params, attribute) {
def val = ''
if (params.genomes && params.genome && params.genomes.containsKey(params.genome)) {
if (params.genomes[ params.genome ].containsKey(attribute)) {
val = params.genomes[ params.genome ][ attribute ]
}
}
return val
}
}

@ -0,0 +1,59 @@
//
// This file holds several functions specific to the workflow/taxprofiler.nf in the nf-core/taxprofiler pipeline
//
class WorkflowTaxprofiler {
//
// Check and validate parameters
//
public static void initialise(params, log) {
genomeExistsError(params, log)
if (!params.fasta) {
log.error "Genome fasta file not specified with e.g. '--fasta genome.fa' or via a detectable config file."
System.exit(1)
}
}
//
// Get workflow summary for MultiQC
//
public static String paramsSummaryMultiqc(workflow, summary) {
String summary_section = ''
for (group in summary.keySet()) {
def group_params = summary.get(group) // This gets the parameters of that particular group
if (group_params) {
summary_section += " <p style=\"font-size:110%\"><b>$group</b></p>\n"
summary_section += " <dl class=\"dl-horizontal\">\n"
for (param in group_params.keySet()) {
summary_section += " <dt>$param</dt><dd><samp>${group_params.get(param) ?: '<span style=\"color:#999999;\">N/A</a>'}</samp></dd>\n"
}
summary_section += " </dl>\n"
}
}
String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n"
yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n"
yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n"
yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n"
yaml_file_text += "plot_type: 'html'\n"
yaml_file_text += "data: |\n"
yaml_file_text += "${summary_section}"
return yaml_file_text
}
//
// Exit pipeline if incorrect --genome key provided
//
private static void genomeExistsError(params, log) {
if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) {
log.error "=============================================================================\n" +
" Genome '${params.genome}' not found in any config files provided to the pipeline.\n" +
" Currently, the available genome keys are:\n" +
" ${params.genomes.keySet().join(", ")}\n" +
"==================================================================================="
System.exit(1)
}
}
}

Binary file not shown.

@ -0,0 +1,63 @@
#!/usr/bin/env nextflow
/*
========================================================================================
nf-core/taxprofiler
========================================================================================
Github : https://github.com/nf-core/taxprofiler
Website: https://nf-co.re/taxprofiler
Slack : https://nfcore.slack.com/channels/taxprofiler
----------------------------------------------------------------------------------------
*/
nextflow.enable.dsl = 2
/*
========================================================================================
GENOME PARAMETER VALUES
========================================================================================
*/
params.fasta = WorkflowMain.getGenomeAttribute(params, 'fasta')
/*
========================================================================================
VALIDATE & PRINT PARAMETER SUMMARY
========================================================================================
*/
WorkflowMain.initialise(workflow, params, log)
/*
========================================================================================
NAMED WORKFLOW FOR PIPELINE
========================================================================================
*/
include { TAXPROFILER } from './workflows/taxprofiler'
//
// WORKFLOW: Run main nf-core/taxprofiler analysis pipeline
//
workflow NFCORE_TAXPROFILER {
TAXPROFILER ()
}
/*
========================================================================================
RUN ALL WORKFLOWS
========================================================================================
*/
//
// WORKFLOW: Execute a single named workflow for the pipeline
// See: https://github.com/nf-core/rnaseq/issues/619
//
workflow {
NFCORE_TAXPROFILER ()
}
/*
========================================================================================
THE END
========================================================================================
*/

@ -0,0 +1,17 @@
{
"name": "nf-core/taxprofiler",
"homePage": "https://github.com/nf-core/taxprofiler",
"repos": {
"nf-core/modules": {
"custom/dumpsoftwareversions": {
"git_sha": "20d8250d9f39ddb05dfb437603aaf99b5c0b2b41"
},
"fastqc": {
"git_sha": "9d0cad583b9a71a6509b754fdf589cbfbed08961"
},
"multiqc": {
"git_sha": "20d8250d9f39ddb05dfb437603aaf99b5c0b2b41"
}
}
}
}

@ -0,0 +1,27 @@
process SAMPLESHEET_CHECK {
tag "$samplesheet"
conda (params.enable_conda ? "conda-forge::python=3.8.3" : null)
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/python:3.8.3' :
'quay.io/biocontainers/python:3.8.3' }"
input:
path samplesheet
output:
path '*.csv' , emit: csv
path "versions.yml", emit: versions
script: // This script is bundled with the pipeline, in nf-core/taxprofiler/bin/
"""
check_samplesheet.py \\
$samplesheet \\
samplesheet.valid.csv
cat <<-END_VERSIONS > versions.yml
"${task.process}":
python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
}

@ -0,0 +1,21 @@
process CUSTOM_DUMPSOFTWAREVERSIONS {
label 'process_low'
// Requires `pyyaml` which does not have a dedicated container but is in the MultiQC container
conda (params.enable_conda ? "bioconda::multiqc=1.11" : null)
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/multiqc:1.11--pyhdfd78af_0' :
'quay.io/biocontainers/multiqc:1.11--pyhdfd78af_0' }"
input:
path versions
output:
path "software_versions.yml" , emit: yml
path "software_versions_mqc.yml", emit: mqc_yml
path "versions.yml" , emit: versions
script:
def args = task.ext.args ?: ''
template 'dumpsoftwareversions.py'
}

@ -0,0 +1,34 @@
name: custom_dumpsoftwareversions
description: Custom module used to dump software versions within the nf-core pipeline template
keywords:
- custom
- version
tools:
- custom:
description: Custom module used to dump software versions within the nf-core pipeline template
homepage: https://github.com/nf-core/tools
documentation: https://github.com/nf-core/tools
licence: ['MIT']
input:
- versions:
type: file
description: YML file containing software versions
pattern: "*.yml"
output:
- yml:
type: file
description: Standard YML file containing software versions
pattern: "software_versions.yml"
- mqc_yml:
type: file
description: MultiQC custom content YML file containing software versions
pattern: "software_versions_mqc.yml"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@drpatelh"
- "@grst"

@ -0,0 +1,89 @@
#!/usr/bin/env python
import yaml
import platform
from textwrap import dedent
def _make_versions_html(versions):
html = [
dedent(
"""\\
<style>
#nf-core-versions tbody:nth-child(even) {
background-color: #f2f2f2;
}
</style>
<table class="table" style="width:100%" id="nf-core-versions">
<thead>
<tr>
<th> Process Name </th>
<th> Software </th>
<th> Version </th>
</tr>
</thead>
"""
)
]
for process, tmp_versions in sorted(versions.items()):
html.append("<tbody>")
for i, (tool, version) in enumerate(sorted(tmp_versions.items())):
html.append(
dedent(
f"""\\
<tr>
<td><samp>{process if (i == 0) else ''}</samp></td>
<td><samp>{tool}</samp></td>
<td><samp>{version}</samp></td>
</tr>
"""
)
)
html.append("</tbody>")
html.append("</table>")
return "\\n".join(html)
versions_this_module = {}
versions_this_module["${task.process}"] = {
"python": platform.python_version(),
"yaml": yaml.__version__,
}
with open("$versions") as f:
versions_by_process = yaml.load(f, Loader=yaml.BaseLoader) | versions_this_module
# aggregate versions by the module name (derived from fully-qualified process name)
versions_by_module = {}
for process, process_versions in versions_by_process.items():
module = process.split(":")[-1]
try:
assert versions_by_module[module] == process_versions, (
"We assume that software versions are the same between all modules. "
"If you see this error-message it means you discovered an edge-case "
"and should open an issue in nf-core/tools. "
)
except KeyError:
versions_by_module[module] = process_versions
versions_by_module["Workflow"] = {
"Nextflow": "$workflow.nextflow.version",
"$workflow.manifest.name": "$workflow.manifest.version",
}
versions_mqc = {
"id": "software_versions",
"section_name": "${workflow.manifest.name} Software Versions",
"section_href": "https://github.com/${workflow.manifest.name}",
"plot_type": "html",
"description": "are collected at run time from the software output.",
"data": _make_versions_html(versions_by_module),
}
with open("software_versions.yml", "w") as f:
yaml.dump(versions_by_module, f, default_flow_style=False)
with open("software_versions_mqc.yml", "w") as f:
yaml.dump(versions_mqc, f, default_flow_style=False)
with open("versions.yml", "w") as f:
yaml.dump(versions_this_module, f, default_flow_style=False)

@ -0,0 +1,44 @@
process FASTQC {
tag "$meta.id"
label 'process_medium'
conda (params.enable_conda ? "bioconda::fastqc=0.11.9" : null)
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' :
'quay.io/biocontainers/fastqc:0.11.9--0' }"
input:
tuple val(meta), path(reads)
output:
tuple val(meta), path("*.html"), emit: html
tuple val(meta), path("*.zip") , emit: zip
path "versions.yml" , emit: versions
script:
def args = task.ext.args ?: ''
// Add soft-links to original FastQs for consistent naming in pipeline
def prefix = task.ext.prefix ?: "${meta.id}"
if (meta.single_end) {
"""
[ ! -f ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz
fastqc $args --threads $task.cpus ${prefix}.fastq.gz
cat <<-END_VERSIONS > versions.yml
"${task.process}":
fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""
} else {
"""
[ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz
[ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz
fastqc $args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz
cat <<-END_VERSIONS > versions.yml
"${task.process}":
fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""
}
}

@ -0,0 +1,52 @@
name: fastqc
description: Run FastQC on sequenced reads
keywords:
- quality control
- qc
- adapters
- fastq
tools:
- fastqc:
description: |
FastQC gives general quality metrics about your reads.
It provides information about the quality score distribution
across your reads, the per base sequence content (%A/C/G/T).
You get information about adapter contamination and other
overrepresented sequences.
homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
licence: ['GPL-2.0-only']
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- reads:
type: file
description: |
List of input FastQ files of size 1 and 2 for single-end and paired-end data,
respectively.
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- html:
type: file
description: FastQC report
pattern: "*_{fastqc.html}"
- zip:
type: file
description: FastQC report archive
pattern: "*_{fastqc.zip}"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@drpatelh"
- "@grst"
- "@ewels"
- "@FelixKrueger"

@ -0,0 +1,28 @@
process MULTIQC {
label 'process_medium'
conda (params.enable_conda ? 'bioconda::multiqc=1.11' : null)
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/multiqc:1.11--pyhdfd78af_0' :
'quay.io/biocontainers/multiqc:1.11--pyhdfd78af_0' }"
input:
path multiqc_files
output:
path "*multiqc_report.html", emit: report
path "*_data" , emit: data
path "*_plots" , optional:true, emit: plots
path "versions.yml" , emit: versions
script:
def args = task.ext.args ?: ''
"""
multiqc -f $args .
cat <<-END_VERSIONS > versions.yml
"${task.process}":
multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""
}

@ -0,0 +1,40 @@
name: MultiQC
description: Aggregate results from bioinformatics analyses across many samples into a single report
keywords:
- QC
- bioinformatics tools
- Beautiful stand-alone HTML report
tools:
- multiqc:
description: |
MultiQC searches a given directory for analysis logs and compiles a HTML report.
It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
homepage: https://multiqc.info/
documentation: https://multiqc.info/docs/
licence: ['GPL-3.0-or-later']
input:
- multiqc_files:
type: file
description: |
List of reports / files recognised by MultiQC, for example the html and zip output of FastQC
output:
- report:
type: file
description: MultiQC report file
pattern: "multiqc_report.html"
- data:
type: dir
description: MultiQC data dir
pattern: "multiqc_data"
- plots:
type: file
description: Plots created by MultiQC
pattern: "*_data"
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
authors:
- "@abhi18av"
- "@bunop"
- "@drpatelh"

@ -0,0 +1,199 @@
/*
========================================================================================
nf-core/taxprofiler Nextflow config file
========================================================================================
Default config options for all compute environments
----------------------------------------------------------------------------------------
*/
// Global default params, used in configs
params {
// TODO nf-core: Specify your pipeline's command line flags
// Input options
input = null
// References
genome = null
igenomes_base = 's3://ngi-igenomes/igenomes'
igenomes_ignore = false
// MultiQC options
multiqc_config = null
multiqc_title = null
max_multiqc_email_size = '25.MB'
// Boilerplate options
outdir = './results'
tracedir = "${params.outdir}/pipeline_info"
email = null
email_on_fail = null
plaintext_email = false
monochrome_logs = false
help = false
validate_params = true
show_hidden_params = false
schema_ignore_params = 'genomes'
enable_conda = false
// Config options
custom_config_version = 'master'
custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}"
config_profile_description = null
config_profile_contact = null
config_profile_url = null
config_profile_name = null
// Max resource options
// Defaults only, expecting to be overwritten
max_memory = '128.GB'
max_cpus = 16
max_time = '240.h'
}
// Load base.config by default for all pipelines
includeConfig 'conf/base.config'
// Load nf-core custom profiles from different Institutions
try {
includeConfig "${params.custom_config_base}/nfcore_custom.config"
} catch (Exception e) {
System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config")
}
profiles {
debug { process.beforeScript = 'echo $HOSTNAME' }
conda {
params.enable_conda = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
docker {
docker.enabled = true
docker.userEmulation = true
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
singularity {
singularity.enabled = true
singularity.autoMounts = true
docker.enabled = false
podman.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
podman {
podman.enabled = true
docker.enabled = false
singularity.enabled = false
shifter.enabled = false
charliecloud.enabled = false
}
shifter {
shifter.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
charliecloud.enabled = false
}
charliecloud {
charliecloud.enabled = true
docker.enabled = false
singularity.enabled = false
podman.enabled = false
shifter.enabled = false
}
test { includeConfig 'conf/test.config' }
test_full { includeConfig 'conf/test_full.config' }
}
// Load igenomes.config if required
if (!params.igenomes_ignore) {
includeConfig 'conf/igenomes.config'
} else {
params.genomes = [:]
}
// Export these variables to prevent local Python/R libraries from conflicting with those in the container
// The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container.
// See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable.
env {
PYTHONNOUSERSITE = 1
R_PROFILE_USER = "/.Rprofile"
R_ENVIRON_USER = "/.Renviron"
JULIA_DEPOT_PATH = "/usr/local/share/julia"
}
// Capture exit codes from upstream processes when piping
process.shell = ['/bin/bash', '-euo', 'pipefail']
def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')
timeline {
enabled = true
file = "${params.tracedir}/execution_timeline_${trace_timestamp}.html"
}
report {
enabled = true
file = "${params.tracedir}/execution_report_${trace_timestamp}.html"
}
trace {
enabled = true
file = "${params.tracedir}/execution_trace_${trace_timestamp}.txt"
}
dag {
enabled = true
file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg"
}
manifest {
name = 'nf-core/taxprofiler'
author = 'nf-core community'
homePage = 'https://github.com/nf-core/taxprofiler'
description = 'Taxonomic profiling of shotgun metagenomic data'
mainScript = 'main.nf'
nextflowVersion = '!>=21.10.3'
version = '1.0dev'
}
// Load modules.config for DSL2 module specific options
includeConfig 'conf/modules.config'
// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
if (type == 'memory') {
try {
if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
return params.max_memory as nextflow.util.MemoryUnit
else
return obj
} catch (all) {
println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'time') {
try {
if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
return params.max_time as nextflow.util.Duration
else
return obj
} catch (all) {
println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'cpus') {
try {
return Math.min( obj, params.max_cpus as int )
} catch (all) {
println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
return obj
}
}
}

@ -0,0 +1,262 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/taxprofiler/master/nextflow_schema.json",
"title": "nf-core/taxprofiler pipeline parameters",
"description": "Taxonomic profiling of shotgun metagenomic data",
"type": "object",
"definitions": {
"input_output_options": {
"title": "Input/output options",
"type": "object",
"fa_icon": "fas fa-terminal",
"description": "Define where the pipeline should find input data and save output data.",
"required": [
"input"
],
"properties": {
"input": {
"type": "string",
"format": "file-path",
"mimetype": "text/csv",
"pattern": "^\\S+\\.csv$",
"schema": "assets/schema_input.json",
"description": "Path to comma-separated file containing information about the samples in the experiment.",
"help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See [usage docs](https://nf-co.re/taxprofiler/usage#samplesheet-input).",
"fa_icon": "fas fa-file-csv"
},
"outdir": {
"type": "string",
"description": "Path to the output directory where the results will be saved.",
"default": "./results",
"fa_icon": "fas fa-folder-open"
},
"email": {
"type": "string",
"description": "Email address for completion summary.",
"fa_icon": "fas fa-envelope",
"help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.",
"pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$"
},
"multiqc_title": {
"type": "string",
"description": "MultiQC report title. Printed as page header, used for filename if not otherwise specified.",
"fa_icon": "fas fa-file-signature"
}
}
},
"reference_genome_options": {
"title": "Reference genome options",
"type": "object",
"fa_icon": "fas fa-dna",
"description": "Reference genome related files and options required for the workflow.",
"properties": {
"genome": {
"type": "string",
"description": "Name of iGenomes reference.",
"fa_icon": "fas fa-book",
"help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`. \n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details."
},
"fasta": {
"type": "string",
"format": "file-path",
"mimetype": "text/plain",
"pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$",
"description": "Path to FASTA genome file.",
"help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.",
"fa_icon": "far fa-file-code"
},
"igenomes_base": {
"type": "string",
"format": "directory-path",
"description": "Directory / URL base for iGenomes references.",
"default": "s3://ngi-igenomes/igenomes",
"fa_icon": "fas fa-cloud-download-alt",
"hidden": true
},
"igenomes_ignore": {
"type": "boolean",
"description": "Do not load the iGenomes reference config.",
"fa_icon": "fas fa-ban",
"hidden": true,
"help_text": "Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`."
}
}
},
"institutional_config_options": {
"title": "Institutional config options",
"type": "object",
"fa_icon": "fas fa-university",
"description": "Parameters used to describe centralised config profiles. These should not be edited.",
"help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.",
"properties": {
"custom_config_version": {
"type": "string",
"description": "Git commit id for Institutional configs.",
"default": "master",
"hidden": true,
"fa_icon": "fas fa-users-cog"
},
"custom_config_base": {
"type": "string",
"description": "Base directory for Institutional configs.",
"default": "https://raw.githubusercontent.com/nf-core/configs/master",
"hidden": true,
"help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.",
"fa_icon": "fas fa-users-cog"
},
"config_profile_name": {
"type": "string",
"description": "Institutional config name.",
"hidden": true,
"fa_icon": "fas fa-users-cog"
},
"config_profile_description": {
"type": "string",
"description": "Institutional config description.",
"hidden": true,
"fa_icon": "fas fa-users-cog"
},
"config_profile_contact": {
"type": "string",
"description": "Institutional config contact information.",
"hidden": true,
"fa_icon": "fas fa-users-cog"
},
"config_profile_url": {
"type": "string",
"description": "Institutional config URL link.",
"hidden": true,
"fa_icon": "fas fa-users-cog"
}
}
},
"max_job_request_options": {
"title": "Max job request options",
"type": "object",
"fa_icon": "fab fa-acquisitions-incorporated",
"description": "Set the top limit for requested resources for any single job.",
"help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.",
"properties": {
"max_cpus": {
"type": "integer",
"description": "Maximum number of CPUs that can be requested for any single job.",
"default": 16,
"fa_icon": "fas fa-microchip",
"hidden": true,
"help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`"
},
"max_memory": {
"type": "string",
"description": "Maximum amount of memory that can be requested for any single job.",
"default": "128.GB",
"fa_icon": "fas fa-memory",
"pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$",
"hidden": true,
"help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`"
},
"max_time": {
"type": "string",
"description": "Maximum amount of time that can be requested for any single job.",
"default": "240.h",
"fa_icon": "far fa-clock",
"pattern": "^(\\d+\\.?\\s*(s|m|h|day)\\s*)+$",
"hidden": true,
"help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`"
}
}
},
"generic_options": {
"title": "Generic options",
"type": "object",
"fa_icon": "fas fa-file-import",
"description": "Less common options for the pipeline, typically set in a config file.",
"help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.",
"properties": {
"help": {
"type": "boolean",
"description": "Display help text.",
"fa_icon": "fas fa-question-circle",
"hidden": true
},
"email_on_fail": {
"type": "string",
"description": "Email address for completion summary, only when pipeline fails.",
"fa_icon": "fas fa-exclamation-triangle",
"pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$",
"help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.",
"hidden": true
},
"plaintext_email": {
"type": "boolean",
"description": "Send plain-text email instead of HTML.",
"fa_icon": "fas fa-remove-format",
"hidden": true
},
"max_multiqc_email_size": {
"type": "string",
"description": "File size limit when attaching MultiQC reports to summary emails.",
"pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$",
"default": "25.MB",
"fa_icon": "fas fa-file-upload",
"hidden": true
},
"monochrome_logs": {
"type": "boolean",
"description": "Do not use coloured log outputs.",
"fa_icon": "fas fa-palette",
"hidden": true
},
"multiqc_config": {
"type": "string",
"description": "Custom config file to supply to MultiQC.",
"fa_icon": "fas fa-cog",
"hidden": true
},
"tracedir": {
"type": "string",
"description": "Directory to keep pipeline Nextflow logs and reports.",
"default": "${params.outdir}/pipeline_info",
"fa_icon": "fas fa-cogs",
"hidden": true
},
"validate_params": {
"type": "boolean",
"description": "Boolean whether to validate parameters against the schema at runtime",
"default": true,
"fa_icon": "fas fa-check-square",
"hidden": true
},
"show_hidden_params": {
"type": "boolean",
"fa_icon": "far fa-eye-slash",
"description": "Show all params when using `--help`",
"hidden": true,
"help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters."
},
"enable_conda": {
"type": "boolean",
"description": "Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.",
"hidden": true,
"fa_icon": "fas fa-bacon"
}
}
}
},
"allOf": [
{
"$ref": "#/definitions/input_output_options"
},
{
"$ref": "#/definitions/reference_genome_options"
},
{
"$ref": "#/definitions/institutional_config_options"
},
{
"$ref": "#/definitions/max_job_request_options"
},
{
"$ref": "#/definitions/generic_options"
}
]
}

@ -0,0 +1,42 @@
//
// Check input samplesheet and get read channels
//
include { SAMPLESHEET_CHECK } from '../../modules/local/samplesheet_check'
workflow INPUT_CHECK {
take:
samplesheet // file: /path/to/samplesheet.csv
main:
SAMPLESHEET_CHECK ( samplesheet )
.csv
.splitCsv ( header:true, sep:',' )
.map { create_fastq_channels(it) }
.set { reads }
emit:
reads // channel: [ val(meta), [ reads ] ]
versions = SAMPLESHEET_CHECK.out.versions // channel: [ versions.yml ]
}
// Function to get list of [ meta, [ fastq_1, fastq_2 ] ]
def create_fastq_channels(LinkedHashMap row) {
def meta = [:]
meta.id = row.sample
meta.single_end = row.single_end.toBoolean()
def array = []
if (!file(row.fastq_1).exists()) {
exit 1, "ERROR: Please check input samplesheet -> Read 1 FastQ file does not exist!\n${row.fastq_1}"
}
if (meta.single_end) {
array = [ meta, [ file(row.fastq_1) ] ]
} else {
if (!file(row.fastq_2).exists()) {
exit 1, "ERROR: Please check input samplesheet -> Read 2 FastQ file does not exist!\n${row.fastq_2}"
}
array = [ meta, [ file(row.fastq_1), file(row.fastq_2) ] ]
}
return array
}

@ -0,0 +1,123 @@
/*
========================================================================================
VALIDATE INPUTS
========================================================================================
*/
def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params)
// Validate input parameters
WorkflowTaxprofiler.initialise(params, log)
// TODO nf-core: Add all file path parameters for the pipeline to the list below
// Check input path parameters to see if they exist
def checkPathParamList = [ params.input, params.multiqc_config, params.fasta ]
for (param in checkPathParamList) { if (param) { file(param, checkIfExists: true) } }
// Check mandatory parameters
if (params.input) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet not specified!' }
/*
========================================================================================
CONFIG FILES
========================================================================================
*/
ch_multiqc_config = file("$projectDir/assets/multiqc_config.yaml", checkIfExists: true)
ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config) : Channel.empty()
/*
========================================================================================
IMPORT LOCAL MODULES/SUBWORKFLOWS
========================================================================================
*/
//
// SUBWORKFLOW: Consisting of a mix of local and nf-core/modules
//
include { INPUT_CHECK } from '../subworkflows/local/input_check'
/*
========================================================================================
IMPORT NF-CORE MODULES/SUBWORKFLOWS
========================================================================================
*/
//
// MODULE: Installed directly from nf-core/modules
//
include { FASTQC } from '../modules/nf-core/modules/fastqc/main'
include { MULTIQC } from '../modules/nf-core/modules/multiqc/main'
include { CUSTOM_DUMPSOFTWAREVERSIONS } from '../modules/nf-core/modules/custom/dumpsoftwareversions/main'
/*
========================================================================================
RUN MAIN WORKFLOW
========================================================================================
*/
// Info required for completion email and summary
def multiqc_report = []
workflow TAXPROFILER {
ch_versions = Channel.empty()
//
// SUBWORKFLOW: Read in samplesheet, validate and stage input files
//
INPUT_CHECK (
ch_input
)
ch_versions = ch_versions.mix(INPUT_CHECK.out.versions)
//
// MODULE: Run FastQC
//
FASTQC (
INPUT_CHECK.out.reads
)
ch_versions = ch_versions.mix(FASTQC.out.versions.first())
CUSTOM_DUMPSOFTWAREVERSIONS (
ch_versions.unique().collectFile(name: 'collated_versions.yml')
)
//
// MODULE: MultiQC
//
workflow_summary = WorkflowTaxprofiler.paramsSummaryMultiqc(workflow, summary_params)
ch_workflow_summary = Channel.value(workflow_summary)
ch_multiqc_files = Channel.empty()
ch_multiqc_files = ch_multiqc_files.mix(Channel.from(ch_multiqc_config))
ch_multiqc_files = ch_multiqc_files.mix(ch_multiqc_custom_config.collect().ifEmpty([]))
ch_multiqc_files = ch_multiqc_files.mix(ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml'))
ch_multiqc_files = ch_multiqc_files.mix(CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect())
ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]}.ifEmpty([]))
MULTIQC (
ch_multiqc_files.collect()
)
multiqc_report = MULTIQC.out.report.toList()
ch_versions = ch_versions.mix(MULTIQC.out.versions)
}
/*
========================================================================================
COMPLETION EMAIL AND SUMMARY
========================================================================================
*/
workflow.onComplete {
if (params.email || params.email_on_fail) {
NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report)
}
NfcoreTemplate.summary(workflow, params, log)
}
/*
========================================================================================
THE END
========================================================================================
*/
Loading…
Cancel
Save