Merge remote-tracking branch 'upstream/master' into bedtools_dev

This commit is contained in:
JoseEspinosa 2020-07-15 15:05:26 +02:00
commit 7f8b8189f8
17 changed files with 170 additions and 90 deletions

View file

@ -9,13 +9,13 @@ jobs:
- uses: actions/setup-node@v1 - uses: actions/setup-node@v1
with: with:
node-version: '10' node-version: "10"
- name: Install markdownlint - name: Install markdownlint
run: npm install -g markdownlint-cli run: npm install -g markdownlint-cli
- name: Run Markdownlint - name: Run Markdownlint
run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.markdownlint.yml
EditorConfig: EditorConfig:
runs-on: ubuntu-latest runs-on: ubuntu-latest
@ -24,7 +24,7 @@ jobs:
- uses: actions/setup-node@v1 - uses: actions/setup-node@v1
with: with:
node-version: '10' node-version: "10"
- name: Install ECLint - name: Install ECLint
run: npm install -g eclint run: npm install -g eclint
@ -41,7 +41,7 @@ jobs:
- name: Install NodeJS - name: Install NodeJS
uses: actions/setup-node@v1 uses: actions/setup-node@v1
with: with:
node-version: '10' node-version: "10"
- name: Install yaml-lint - name: Install yaml-lint
run: npm install -g yaml-lint run: npm install -g yaml-lint

3
.gitignore vendored
View file

@ -1,5 +1,6 @@
.nextflow* .nextflow*
work/ work/
data/
results/ results/
test_output/
.DS_Store .DS_Store
*.code-workspace

View file

@ -6,28 +6,28 @@ A repository for hosting nextflow [`DSL2`](https://www.nextflow.io/docs/edge/dsl
## Table of contents ## Table of contents
* [Using existing modules](#using-existing-modules) - [Using existing modules](#using-existing-modules)
* [Configuration and parameters](#configuration-and-parameters) - [Configuration and parameters](#configuration-and-parameters)
* [Offline usage](#offline-usage) - [Offline usage](#offline-usage)
* [Adding a new module file](#adding-a-new-module-file) - [Adding a new module file](#adding-a-new-module-file)
* [Testing](#testing) - [Testing](#testing)
* [Documentation](#documentation) - [Documentation](#documentation)
* [Uploading to `nf-core/modules`](#uploading-to-nf-coremodules) - [Uploading to `nf-core/modules`](#uploading-to-nf-coremodules)
* [Help](#help) - [Help](#help)
## Terminology ## Terminology
The features offered by Nextflow DSL 2 can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology we have decided to use when referring to DSL 2 components: The features offered by Nextflow DSL 2 can be used in various ways depending on the granularity with which you would like to write pipelines. Please see the listing below for the hierarchy and associated terminology we have decided to use when referring to DSL 2 components:
* *Module*: A `process`that can be used within different pipelines and is as atomic as possible i.e. cannot be split into another module. An example of this would be a module file containing the process definition for a single tool such as `FastQC`. This repository has been created to only host atomic module files that should be added to the `tools` sub-directory along with the required documentation, software and tests. - *Module*: A `process`that can be used within different pipelines and is as atomic as possible i.e. cannot be split into another module. An example of this would be a module file containing the process definition for a single tool such as `FastQC`. This repository has been created to only host atomic module files that should be added to the `tools` sub-directory along with the required documentation, software and tests.
* *Sub-workflow*: A chain of multiple modules that offer a higher-level of functionality within the context of a pipeline. For example, a sub-workflow to run multiple QC tools with FastQ files as input. Sub-workflows should be shipped with the pipeline implementation and if required they should be shared amongst different pipelines directly from there. As it stands, this repository will not host sub-workflows. - *Sub-workflow*: A chain of multiple modules that offer a higher-level of functionality within the context of a pipeline. For example, a sub-workflow to run multiple QC tools with FastQ files as input. Sub-workflows should be shipped with the pipeline implementation and if required they should be shared amongst different pipelines directly from there. As it stands, this repository will not host sub-workflows.
* *Workflow*: What DSL 1 users would consider an end-to-end pipeline. For example, from one or more inputs to a series of outputs. This can either be implemented using a large monolithic script as with DSL 1, or by using a combination of DSL 2 individual modules and sub-workflows. - *Workflow*: What DSL 1 users would consider an end-to-end pipeline. For example, from one or more inputs to a series of outputs. This can either be implemented using a large monolithic script as with DSL 1, or by using a combination of DSL 2 individual modules and sub-workflows.
## Using existing modules ## Using existing modules
The Nextflow [`include`](https://www.nextflow.io/docs/edge/dsl2.html#modules-include) statement can be used within your pipelines in order to load module files that you have available locally. The Nextflow [`include`](https://www.nextflow.io/docs/edge/dsl2.html#modules-include) statement can be used within your pipelines in order to load module files that you have available locally.
You should be able to get a good idea as to how other people are using module files by looking at pipelines available in nf-core e.g. [`nf-core/rnaseq`](https://github.com/nf-core/rnaseq/pull/162) You should be able to get a good idea as to how other people are using module files by looking at pipelines available in nf-core e.g. [`nf-core/chipseq`](https://github.com/nf-core/chipseq/tree/dev) (work in progress)
### Configuration and parameters ### Configuration and parameters
@ -54,13 +54,49 @@ nextflow run /path/to/pipeline/ -c /path/to/custom_module.conf
## Adding a new module file ## Adding a new module file
If you decide to upload your module file to `nf-core/modules` then this will ensure that it will be automatically downloaded, and available at run-time to all nf-core pipelines, and to everyone within the Nextflow community! See [`nf-core/modules/nf`](https://github.com/nf-core/modules/tree/master/nf) for examples. If you decide to upload your module file to `nf-core/modules` then this will
ensure that it will be automatically downloaded, and available at run-time to
all nf-core pipelines, and to everyone within the Nextflow community! See
[`nf-core/modules/software`](https://github.com/nf-core/modules/tree/master/software)
for examples.
> The definition and standards for module files are still under discussion amongst the community but hopefully, a description should be added here soon! **The definition and standards for module files are still under discussion
amongst the community. Currently the following points have been agreed on:**
The key words "MUST", "MUST NOT", "SHOULD", etc. are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119).
### Defining inputs, outputs and parameters
- A module file SHOULD only define inputs and outputs as parameters. Additionally,
- it MUST define threads or resources where required for a particular process using `task.cpus`
- it MUST be possible to pass additional parameters to the tool as a command line string via the `params.<MODULE>_args` parameter.
- All NGS modules MUST accept a triplet [name, single_end, reads] as input. The single-end boolean values MUST be specified through the input channel and not inferred from the data e.g. [here](https://github.com/nf-core/tools/blob/028a9b3f9d1ad044e879a1de13d3c3a25a06b9a7/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/modules/nf-core/fastqc.nf#L13).
- Process names MUST be all uppercase.
- Each process MUST emit a file `<TOOL>.version.txt` containing a single line with the software's version in the format `v<VERSION_NUMBER>`.
- All outputs MUST be named using `emit`.
### Atomicity
- Software that can be piped together SHOULD be added to separate module files unless there is an run-time, storage advantage in implementing in this way e.g. `bwa mem | samtools view -C -T ref.fasta` to output CRAM instead of SAM.
### Publishing results
- The module MUST accept the parameters `params.out_dir` and `params.publish_dir` and MUST publish results into `${params.out_dir}/${params.publish_dir}`.
- The `publishDirMode` MUST be configurable via `params.publish_dir_mode`
- The module MUST accept a parameter `params.publish_results` accepting at least
- `"none"`, to publish no files at all, and
- `"default"`, to publish a sensible selection of files.
It MAY accept further options.
- To ensure consistent naming, files SHOULD be renamed according to the `$name` variable before returning them.
### Testing ### Testing
- Every module MUST be tested by adding a test workflow with a toy dataset.
- Test data MUST be stored within this repo. It is RECOMMENDED to re-use generic files from `tests/data` by symlinking them into the test directory of the module. Specific files MUST be added to the test-directory directly. Test files MUST be kept as tiny as possible.
If you want to add a new module config file to `nf-core/modules` please test that your pipeline of choice runs as expected by using the [`-include`](https://www.nextflow.io/docs/edge/dsl2.html#modules-include) statement with a local version of the module file. ### Software requirements
- Software requirements SHOULD be declared in a conda `environment.yml` file, including exact version numbers. Additionally, there MUST be a `Dockerfile` that containerizes the environment, or packages the software if conda is not available.
### File formats
- Wherever possible, [CRAM](https://en.wikipedia.org/wiki/CRAM_(file_format)) files SHOULD be used over BAM files.
- Wherever possible, FASTQ files SHOULD be compressed using gzip.
### Documentation ### Documentation
@ -68,7 +104,7 @@ Please add some documentation to the top of the module file in the form of nativ
### Uploading to `nf-core/modules` ### Uploading to `nf-core/modules`
[Fork](https://help.github.com/articles/fork-a-repo/) the `nf-core/modules` repository to your own GitHub account. Within the local clone of your fork add the module file to the [`nf-core/modules/nf`](https://github.com/nf-core/modules/tree/master/nf) directory. Please keep the naming consistent between the module and documentation files e.g. `bwa.nf` and `bwa.md`, respectively. [Fork](https://help.github.com/articles/fork-a-repo/) the `nf-core/modules` repository to your own GitHub account. Within the local clone of your fork add the module file to the [`nf-core/modules/software`](https://github.com/nf-core/modules/tree/master/software) directory. Please keep the naming consistent between the module and documentation files e.g. `bwa.nf` and `bwa.md`, respectively.
Commit and push these changes to your local clone on GitHub, and then [create a pull request](https://help.github.com/articles/creating-a-pull-request-from-a-fork/) on `nf-core/modules` GitHub repo with the appropriate information. Commit and push these changes to your local clone on GitHub, and then [create a pull request](https://help.github.com/articles/creating-a-pull-request-from-a-fork/) on `nf-core/modules` GitHub repo with the appropriate information.

View file

@ -6,4 +6,4 @@ channels:
- bioconda - bioconda
- defaults - defaults
dependencies: dependencies:
- fastqc=0.11.8 - fastqc=0.11.9

View file

@ -1,37 +1,40 @@
nextflow.preview.dsl = 2 def MODULE = "fastqc"
params.publish_dir = MODULE
params.publish_results = "default"
process FASTQC { process FASTQC {
publishDir "${params.out_dir}/${params.publish_dir}",
mode: params.publish_dir_mode,
saveAs: { filename ->
if (params.publish_results == "none") null
else filename }
// tag "FastQC - $sample_id" container "docker.pkg.github.com/nf-core/$MODULE"
conda "${moduleDir}/environment.yml"
input: input:
tuple val(name), path(reads) tuple val(name), val(single_end), path(reads)
val (outputdir)
// fastqc_args are best passed into the workflow in the following manner:
// --fastqc_args="--nogroup -a custom_adapter_file.txt"
val (fastqc_args)
val (verbose)
output: output:
tuple val(name), path ("*fastqc*"), emit: all tuple val(name), val(single_end), path("*.html"), emit: html
path "*.zip", emit: report // e.g. for MultiQC later tuple val(name), val(single_end), path("*.zip"), emit: zip
path "*.version.txt", emit: version
// container 'quay.io/biocontainers/fastqc:0.11.8--2'
publishDir "$outputdir",
mode: "copy", overwrite: true
script: script:
// Add soft-links to original FastQs for consistent naming in pipeline
if (verbose){ if (single_end) {
println ("[MODULE] FASTQC ARGS: " + fastqc_args)
}
""" """
module load fastqc [ ! -f ${name}.fastq.gz ] && ln -s $reads ${name}.fastq.gz
fastqc $fastqc_args -q -t 2 $reads fastqc ${params.fastqc_args} --threads $task.cpus ${name}.fastq.gz
fastqc --version | sed -n "s/.*\\(v.*\$\\)/\\1/p" > fastqc.version.txt
fastqc --version &> fastqc.version.txt
""" """
} else {
"""
[ ! -f ${name}_1.fastq.gz ] && ln -s ${reads[0]} ${name}_1.fastq.gz
[ ! -f ${name}_2.fastq.gz ] && ln -s ${reads[1]} ${name}_2.fastq.gz
fastqc ${params.fastqc_args} --threads $task.cpus ${name}_1.fastq.gz ${name}_2.fastq.gz
fastqc --version | sed -n "s/.*\\(v.*\$\\)/\\1/p" > fastqc.version.txt
"""
}
} }

View file

@ -1,33 +1,63 @@
name: FastQC name: FastQC
description: Run FastQC on sequenced reads description: Run FastQC on sequenced reads
keywords: keywords:
- Quality Control - Quality Control
- QC - QC
- Adapters - Adapters
tools: tools:
- fastqc: - fastqc:
description: | description: |
FastQC gives general quality metrics about your reads. FastQC gives general quality metrics about your reads.
It provides information about the quality score distribution It provides information about the quality score distribution
across your reads, the per base sequence content (%A/C/G/T). across your reads, the per base sequence content (%A/C/G/T).
You get information about adapter contamination and other You get information about adapter contamination and other
overrepresented sequences. overrepresented sequences.
homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
params:
- fastqc_args:
type: string
description: Additional command line arguments passed to fastqc.
- out_dir:
type: string
description: |
The pipeline's output directory. By default, the module will
output files into `$out_dir/MODULE_NAME`
- publish_dir:
type: string
description: |
Append to the path for the standard output directory provided by `$out_dir`.
- publish_dir_mode:
type: string
description: |
Provide a value for the Nextflow `publishDir` mode parameter
(e.g. copy, link, ...)
- publish_results:
type: string
description: |
Whether or not to publish results into `publish_dir`. Set to `none` to not
publish any files at all; to `default` to publish all relevant files.
input: input:
- - name:
- name: type: string
type: string description: Sample identifier
description: Sample identifier - single_end:
- reads: type: boolean
type: file description: |
description: Input FastQ file, or pair of files Boolean indicating whether the corresponding sample is single-end (true)
or paired-end (false).
- reads:
type: file
description: |
List of input FastQ files of size 1 and 2 for single-end and paired-end data,
respectively.
output: output:
- - report:
- report: type: file
type: file description: FastQC report
description: FastQC report pattern: "*_fastqc.{zip,html}"
pattern: "*_fastqc.{zip,html}"
authors: authors:
- "@ewels" - "@grst"
- "@FelixKrueger" - "@drpatelh"
- "@ewels"
- "@FelixKrueger"

View file

@ -0,0 +1 @@
../../../../tests/data/fastq/rna/test_R1.fastq.gz

View file

@ -0,0 +1 @@
../../../../tests/data/fastq/rna/test_R2.fastq.gz

View file

@ -0,0 +1 @@
../../../../tests/data/fastq/rna/test_single_end.fastq.gz

View file

@ -1,21 +1,31 @@
#!/usr/bin/env nextflow #!/usr/bin/env nextflow
nextflow.preview.dsl = 2 nextflow.preview.dsl = 2
params.outdir = "." // gets set in nextflow.config file (as './results/fastqc') params.out_dir = "test_output"
params.fastqc_args = '' params.fastqc_args = ''
params.verbose = false params.publish_dir_mode = "copy"
// TODO: check the output files in some way include { FASTQC } from '../main.nf'
// include '../../../tests/functions/check_process_outputs.nf'
include '../main.nf'
// Define input channels /**
ch_read_files = Channel * Test if FASTQC runs with single-end data
.fromFilePairs('../../../test-datasets/test*{1,2}.fastq.gz',size:-1) */
// .view() // to check whether the input channel works workflow test_single_end {
input_files = Channel.fromPath("data/test_single_end.fastq.gz")
// Run the workflow .map {f -> [f.baseName, true, f]}
workflow { FASTQC(input_files)
FASTQC (ch_read_files, params.outdir, params.fastqc_args, params.verbose) }
// .check_output()
/**
* Test if FASTQC runs with paired end data
*/
workflow test_paired_end {
input_files = Channel.fromFilePairs("data/test_R{1,2}.fastq.gz")
.map {f -> [f[0], false, f[1]]}
FASTQC(input_files)
}
workflow {
test_single_end()
test_paired_end()
} }

View file

@ -1,2 +0,0 @@
// docker.enabled = true
params.outdir = './results/fastqc'

@ -1 +0,0 @@
Subproject commit ddbd0c4cf7f1721c78673c4dcc91fcd7940e67f8

0
tests/data/.gitignore vendored Normal file
View file

Binary file not shown.

Binary file not shown.

Binary file not shown.