1
0
Fork 0
mirror of https://github.com/MillironX/taxprofiler.git synced 2024-11-10 22:33:09 +00:00

Merge branch 'dev' into post-usage-draft-tweaks

This commit is contained in:
James A. Fellows Yates 2023-01-19 10:36:59 +01:00 committed by GitHub
commit 36629b3dde
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
47 changed files with 304 additions and 199 deletions

View file

@ -16,7 +16,7 @@
<!-- TODO nf-core: Write a 1-2 sentence summary of what data the pipeline is for and what it does -->
**nf-core/taxprofiler** is a bioinformatics best-practice analysis pipeline for taxonomic profiling of shotgun metagenomic data. It allows for in-parallel profiling with multiple profiling tools against multiple databases, produces standardised output tables.
**nf-core/taxprofiler** is a bioinformatics best-practice analysis pipeline for taxonomic classification and profiling of shotgun metagenomic data. It allows for in-parallel taxonomic identification of reads or taxonomic abundance estimation with multiple classification and profiling tools against multiple databases, produces standardised output tables.
The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
@ -37,7 +37,7 @@ On release, automated continuous integration tests run the pipeline on a full-si
- Host-read removal (short-read: [BowTie2](http://bowtie-bio.sourceforge.net/bowtie2/); long-read: [Minimap2](https://github.com/lh3/minimap2))
- Run merging
3. Supports statistics for host-read removal ([Samtools](http://www.htslib.org/))
4. Performs taxonomic profiling using one or more of:
4. Performs taxonomic classification and/or profiling using one or more of:
- [Kraken2](https://ccb.jhu.edu/software/kraken2/)
- [MetaPhlAn3](https://huttenhower.sph.harvard.edu/metaphlan/)
- [MALT](https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/malt/)

View file

@ -19,9 +19,10 @@ custom_logo_title: "nf-core/taxprofiler"
run_modules:
- fastqc
- adapterRemoval
- fastp
- bbduk
- prinseqplusplus
- fastp
- porechop
- filtlong
- bowtie2
- minimap2
@ -32,9 +33,13 @@ run_modules:
- diamond
- malt
- motus
- porechop
- custom_content
sp:
diamond:
contents: "diamond v"
num_lines: 10
#extra_fn_clean_exts:
# - '_fastp'
# - '.pe.settings'
@ -102,9 +107,10 @@ table_columns_placement:
FastQC (pre-Trimming):
total_sequences: 100
avg_sequence_length: 110
percent_duplicates: 120
percent_gc: 130
percent_fails: 140
median_sequence_length: 120
percent_duplicates: 130
percent_gc: 140
percent_fails: 150
Falco (pre-Trimming):
total_sequences: 200
avg_sequence_length: 210
@ -118,43 +124,84 @@ table_columns_placement:
after_filtering_gc_content: 330
after_filtering_q30_rate: 340
after_filtering_q30_bases: 350
filtering_result_passed_filter_reads: 360
Adapter Removal:
aligned_total: 360
percent_aligned: 370
percent_collapsed: 380
percent_discarded: 390
Porechop:
Input Reads: 400
Start Trimmed: 410
Start Trimmed Percent: 420
End Trimmed: 430
End Trimmed Percent: 440
Middle Split: 450
Middle Split Percent: 460
Filtlong:
Target bases: 500
FastQC (post-Trimming):
total_sequences: 400
avg_sequence_length: 410
percent_duplicates: 420
percent_gc: 430
percent_fails: 440
total_sequences: 600
avg_sequence_length: 610
median_sequence_length: 620
percent_duplicates: 630
percent_gc: 640
percent_fails: 650
Falco (post-Trimming):
total_sequences: 500
avg_sequence_length: 510
percent_duplicates: 520
percent_gc: 530
percent_fails: 540
total_sequences: 700
avg_sequence_length: 710
percent_duplicates: 720
percent_gc: 730
percent_fails: 740
BBDuk:
Input reads: 800
Total Removed bases percent: 810
Total Removed bases: 820
Total Removed reads percent: 830
Total Removed reads: 840
PRINSEQ++:
prinseqplusplus_total: 900
bowtie2:
overall_alignment_rate: 600
overall_alignment_rate: 1000
Samtools Stats:
raw_total_sequences: 700
reads_mapped: 710
reads_mapped_percent: 720
reads_properly_paired_percent: 730
non-primary_alignments: 740
reads_MQ0_percent: 750
error_rate: 760
MALT:
Num. of queries: 1000
Total reads: 1100
Mappability: 1200
Assig. Taxonomy: 1300
Taxonomic assignment success: 1400
raw_total_sequences: 1100
reads_mapped: 1110
reads_mapped_percent: 1120
reads_properly_paired_percent: 1130
non-primary_alignments: 1140
reads_MQ0_percent: 1150
error_rate: 1160
Bracken:
"% Unclassified": 1200
"% Top 5": 1210
Centrifuge:
"% Unclassified": 1300
"% Top 5": 1310
DIAMOND:
queries_aligned: 1400
Kaiju:
assigned: 2000
"% Assigned": 2100
"% Unclassified": 2200
assigned: 1500
"% Assigned": 1510
"% Unclassified": 1520
Kraken:
"% Unclassified": 1600
"% Top 5": 1610
MALT:
"Num. of queries": 1700
Total reads: 1710
Mappability: 1720
Assig. Taxonomy: 1730
Taxonomic assignment success: 1740
motus:
Total number of reads: 1800
Number of reads after filtering: 1810
Total number of inserts: 1820
Unique mappers: 1830
Multiple mappers: 1840
Ignored multiple mapper without unique hit: 1850
"Number of ref-mOTUs": 1860
"Number of meta-mOTUs": 1870
"Number of ext-mOTUs": 1880
table_columns_visible:
FastQC (pre-Trimming):
@ -176,6 +223,16 @@ table_columns_visible:
after_filtering_gc_content: False
after_filtering_q30_rate: False
after_filtering_q30_bases: False
porechop:
Input reads: False
Start Trimmed:
Start Trimmed Percent: True
End Trimmed: False
End Trimmed Percent: True
Middle Split: False
Middle Split Percent: True
Filtlong:
Target bases: True
Adapter Removal:
aligned_total: True
percent_aligned: True
@ -193,6 +250,14 @@ table_columns_visible:
percent_duplicates: False
percent_gc: False
percent_fails: False
BBDuk:
Input reads: False
Total Removed bases Percent: False
Total Removed bases: False
Total Removed reads percent: True
Total Removed reads: False
"PRINSEQ++":
prinseqplusplus_total: True
bowtie2:
overall_alignment_rate: True
Samtools Stats:
@ -204,24 +269,35 @@ table_columns_visible:
reads_MQ0_percent: False
error_rate: False
Kraken:
"% Unclassified": True
"% Unclassified": False
"% Top 5": False
Bracken:
"% Unclassified": True
"% Unclassified": False
"% Top 5": False
Centrifuge:
"% Unclassified": True
"% Top 5": False
MALT:
Num. of queries: True
Total reads: True
Mappability: True
Assig. Taxonomy: False
Taxonomic assignment success: True
Centrifuge: False
DIAMOND:
queries_aligned: False
Kaiju:
assigned: False
"% Assigned": False
"% Unclassified": True
"% Unclassified": False
MALT:
"Num. of queries": False
Total reads: False
Mappability: False
Assig. Taxonomy: False
Taxonomic assignment success: False
motus:
Total number of reads: False
Number of reads after filtering: False
Total number of inserts: False
Unique mappers: False
Multiple mappers: False
Ignored multiple mapper without unique hit: False
"Number of ref-mOTUs": False
"Number of meta-mOTUs": False
"Number of ext-mOTUs": False
table_columns_name:
FastQC (pre-Trimming):
total_sequences: "Nr. Input Reads"
@ -253,7 +329,13 @@ table_columns_name:
reads_mapped_percent: "% Mapped Reads"
extra_fn_clean_exts:
- ".kraken2.kraken2.report.txt"
- ".centrifuge.txt"
- ".bracken.kraken2.report.txt"
- "kraken2.report.txt"
- ".txt"
- ".settings"
- ".bbduk"
- ".unmapped"
- "_filtered"
- "_processed"
section_comments:
general_stats: "By default, all read count columns are displayed as millions (M) of reads."

View file

@ -280,7 +280,7 @@ process {
"entropywindow=${params.shortread_complexityfilter_bbduk_windowsize}",
params.shortread_complexityfilter_bbduk_mask ? "entropymask=t" : "entropymask=f"
].join(' ').trim()
ext.prefix = { "${meta.id}-${meta.run_accession}" }
ext.prefix = { "${meta.id}_${meta.run_accession}" }
publishDir = [
[
path: { "${params.outdir}/bbduk/" },
@ -300,9 +300,8 @@ process {
ext.args = [
params.shortread_complexityfilter_prinseqplusplus_mode == 'dust' ? "-lc_dust=${params.shortread_complexityfilter_prinseqplusplus_dustscore}" : "-lc_entropy=${params.shortread_complexityfilter_entropy}",
"-trim_qual_left=0 -trim_qual_left=0 -trim_qual_window=0 -trim_qual_step=0",
"-VERBOSE 2"
].join(' ').trim()
ext.prefix = { "${meta.id}-${meta.run_accession}" }
ext.prefix = { "${meta.id}_${meta.run_accession}" }
publishDir = [
[
path: { "${params.outdir}/prinseqplusplus/" },
@ -351,7 +350,7 @@ process {
withName: KRAKEN2_KRAKEN2 {
ext.args = params.kraken2_save_minimizers ? { "${meta.db_params} --report-minimizer-data" } : { "${meta.db_params}" }
ext.prefix = params.perform_runmerging ? { meta.tool == "bracken" ? "${meta.id}-${meta.db_name}.bracken" : "${meta.id}-${meta.db_name}" } : { meta.tool == "bracken" ? "${meta.id}-${meta.run_accession}-${meta.db_name}.bracken" : "${meta.id}-${meta.run_accession}-${meta.db_name}" }
ext.prefix = params.perform_runmerging ? { meta.tool == "bracken" ? "${meta.id}_${meta.db_name}.bracken" : "${meta.id}_${meta.db_name}.kraken" } : { meta.tool == "bracken" ? "${meta.id}_${meta.run_accession}_${meta.db_name}.bracken" : "${meta.id}_${meta.run_accession}_${meta.db_name}.kraken" }
publishDir = [
path: { "${params.outdir}/kraken2/${meta.db_name}/" },
mode: params.publish_dir_mode,
@ -361,7 +360,7 @@ process {
withName: BRACKEN_BRACKEN {
errorStrategy = 'ignore'
ext.prefix = params.perform_runmerging ? { "${meta.id}-${meta.db_name}.bracken" } : { "${meta.id}-${meta.run_accession}-${meta.db_name}.bracken" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.bracken" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.bracken" }
publishDir = [
path: { "${params.outdir}/bracken/${meta.db_name}/" },
mode: params.publish_dir_mode,
@ -390,7 +389,7 @@ process {
withName: KRAKENUNIQ_PRELOADEDKRAKENUNIQ {
ext.args = { "${meta.db_params}" }
// one run with multiple samples, so fix ID to just db name to ensure clean log name
ext.prefix = { "${meta.db_name}" }
ext.prefix = { "${meta.db_name}.krakenuniq" }
publishDir = [
path: { "${params.outdir}/krakenuniq/${meta.db_name}/" },
mode: params.publish_dir_mode,
@ -399,7 +398,7 @@ process {
}
withName: KRONA_CLEANUP {
ext.prefix = params.perform_runmerging ? { "${meta.id}-${meta.db_name}" } : { "${meta.id}-${meta.run_accession}-${meta.db_name}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}" }
publishDir = [
path: { "${params.outdir}/krona/" },
mode: params.publish_dir_mode,
@ -408,7 +407,7 @@ process {
}
withName: KRONA_KTIMPORTTEXT {
ext.prefix = { "${meta.tool}-${meta.id}" }
ext.prefix = { "${meta.tool}_${meta.id}" }
publishDir = [
path: { "${params.outdir}/krona/" },
mode: params.publish_dir_mode,
@ -418,12 +417,12 @@ process {
withName: 'MEGAN_RMA2INFO_KRONA' {
ext.args = { "--read2class Taxonomy" }
ext.prefix = { "${meta.id}-${meta.db_name}" }
ext.prefix = { "${meta.id}_${meta.db_name}" }
}
withName: KRONA_KTIMPORTTAXONOMY {
ext.args = "-i"
ext.prefix = { "${meta.tool}-${meta.id}" }
ext.prefix = { "${meta.tool}_${meta.id}" }
publishDir = [
path: { "${params.outdir}/krona/" },
mode: params.publish_dir_mode,
@ -433,7 +432,7 @@ process {
withName: METAPHLAN3_METAPHLAN3 {
ext.args = { "${meta.db_params}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}-${meta.db_name}" } : { "${meta.id}-${meta.run_accession}-${meta.db_name}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.metaphlan3" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.metaphlan3" }
publishDir = [
path: { "${params.outdir}/metaphlan3/${meta.db_name}/" },
mode: params.publish_dir_mode,
@ -457,13 +456,13 @@ process {
pattern: '*.{txt,sam,gz}'
]
ext.args = { "${meta.db_params}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}-${meta.db_name}.centrifuge" } : { "${meta.id}-${meta.run_accession}-${meta.db_name}.centrifuge" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.centrifuge" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.centrifuge" }
}
withName: CENTRIFUGE_KREPORT {
errorStrategy = {task.exitStatus == 255 ? 'ignore' : 'retry'}
ext.args = { "${meta.db_params}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}-${meta.db_name}.centrifuge" } : { "${meta.id}-${meta.run_accession}-${meta.db_name}.centrifuge" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.centrifuge" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.centrifuge" }
publishDir = [
path: { "${params.outdir}/centrifuge/${meta.db_name}/" },
mode: params.publish_dir_mode,
@ -481,7 +480,7 @@ process {
}
withName: KAIJU_KAIJU {
ext.prefix = params.perform_runmerging ? { "${meta.id}-${meta.db_name}" } : { "${meta.id}-${meta.run_accession}-${meta.db_name}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.kaiju" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.kaiju" }
publishDir = [
path: { "${params.outdir}/kaiju/${meta.db_name}/" },
mode: params.publish_dir_mode,
@ -505,7 +504,7 @@ process {
withName: DIAMOND_BLASTX {
ext.args = { "${meta.db_params}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}-${meta.db_name}" } : { "${meta.id}-${meta.run_accession}-${meta.db_name}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}.diamond" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}.diamond" }
publishDir = [
path: { "${params.outdir}/diamond/${meta.db_name}/" },
mode: params.publish_dir_mode,
@ -521,7 +520,7 @@ process {
params.motus_save_mgc_read_counts ? "-M ${task.ext.prefix}.mgc" : ""
].join(',').replaceAll(','," ")
}
ext.prefix = params.perform_runmerging ? { "${meta.id}-${meta.db_name}" } : { "${meta.id}-${meta.run_accession}-${meta.db_name}" }
ext.prefix = params.perform_runmerging ? { "${meta.id}_${meta.db_name}" } : { "${meta.id}_${meta.run_accession}_${meta.db_name}" }
publishDir = [
path: { "${params.outdir}/motus/${meta.db_name}/" },
mode: params.publish_dir_mode

View file

@ -77,9 +77,9 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p
### Full database sheet
nf-core/taxprofiler supports multiple databases being profiled in parallel for each tool.
nf-core/taxprofiler supports multiple databases being classified/profiled against in parallel for each tool.
Databases can be supplied either in the form of a compressed `.tar.gz` archive of a directory containing all relevant database files or the path to a directory on the filesystem.
The pipeline takes the locations and specific profiling parameters of the tool of these databases as input via a four column comma-separated sheet.
The pipeline takes the paths and specific classification/profiling parameters of the tool of these databases as input via a four column comma-separated sheet.
> ⚠️ nf-core/taxprofiler does not provide any databases by default, nor does it currently generate them for you. This must be performed manually by the user. See below for more information of the expected database files.
@ -99,14 +99,14 @@ motus,db_mOTU,,/<path>/<to>/motus/motus_database/
Column specifications are as follows:
| Column | Description |
| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `tool` | Taxonomic profiling tool (supported by nf-core/taxprofiler) that the database has been indexed for [required]. Please note that `bracken` also implies running `kraken2` on the same database. |
| `db_name` | A unique name per tool for the particular database [required]. Please note that names need to be unique across both `kraken2` and `bracken` as well, even if re-using the same database. |
| `db_params` | Any parameters of the given taxonomic profiler that you wish to specify that the taxonomic profiling tool should use when profiling against this specific. Can be empty to use taxonomic profiler defaults. Must not be surrounded by quotes [required]. We generally do not recommend specifying parameters here that turn on/off saving of output files or specifying particular file extensions - this should be already addressed via pipeline parameters. |
| `db_path` | Path to the database. Can either be a path to a directory containing the database index files or a `.tar.gz` file which contains the compressed database directory with the same name as the tar archive, minus `.tar.gz` [required]. |
| Column | Description |
| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `tool` | Taxonomic profiling tool (supported by nf-core/taxprofiler) that the database has been indexed for [required]. Please note that `bracken` also implies running `kraken2` on the same database. |
| `db_name` | A unique name per tool for the particular database [required]. Please note that names need to be unique across both `kraken2` and `bracken` as well, even if re-using the same database. |
| `db_params` | Any parameters of the given taxonomic classifier/profiler that you wish to specify that the taxonomic classifier/profiling tool should use when profiling against this specific database. Can be empty to use taxonomic classifier/profiler defaults. Must not be surrounded by quotes [required]. We generally do not recommend specifying parameters here that turn on/off saving of output files or specifying particular file extensions - this should be already addressed via pipeline parameters. |
| `db_path` | Path to the database. Can either be a path to a directory containing the database index files or a `.tar.gz` file which contains the compressed database directory with the same name as the tar archive, minus `.tar.gz` [required]. |
> 💡 You can also specify the same database directory/file twice (ensuring unique `db_name`s) and specify different parameters for each database to compare the effect of different parameters during profiling.
> 💡 You can also specify the same database directory/file twice (ensuring unique `db_name`s) and specify different parameters for each database to compare the effect of different parameters during classification/profiling.
nf-core/taxprofiler will automatically decompress and extract any compressed archives for you.
@ -134,6 +134,8 @@ nextflow run nf-core/taxprofiler --input samplesheet.csv --databases databases.c
This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
When running nf-core/taxprofiler, every step and tool is 'opt in'. To run a given classifier/profiler you must make sure to supply both a database in your `<database>.csv` and supply `--run_<profiler>` flag to your command. Omitting either will result in the classification/profiling tool not executing. If you wish to perform pre-processing (adapter clipping, merge running etc.) or post-processing (visualisation) steps, these are also opt in with a `--perform_<step>` flag. In some cases, the pre- and post-processing steps may also require additional files. Please check the parameters tab of this documentation for more information.
Note that the pipeline will create the following files in your working directory:
```bash
@ -165,9 +167,9 @@ It is highly recommended to run this on raw reads to remove artifacts from seque
There are currently two options for short-read preprocessing: [`fastp`](https://github.com/OpenGene/fastp) or [`adapterremoval`](https://github.com/MikkelSchubert/adapterremoval).
For adapter clipping, you can either rely on the tool's default adapter sequences, or supply your own adapters (`--shortread_qc_adapter1` and `--shortread_qc_adapter2`)
By default, paired-end merging is not activated. In this case paired-end 'alignment' against the reference databases is performed where supported, and if not, supported pairs will be independently profiled. If paired-end merging is activated you can also specify whether to include unmerged reads in the reads sent for profiling (`--shortread_qc_mergepairs` and `--shortread_qc_includeunmerged`).
By default, paired-end merging is not activated. In this case paired-end 'alignment' against the reference databases is performed where supported, and if not, supported pairs will be independently classified/profiled. If paired-end merging is activated you can also specify whether to include unmerged reads in the reads sent for classification/profiling (`--shortread_qc_mergepairs` and `--shortread_qc_includeunmerged`).
You can also turn off clipping and only perform paired-end merging, if requested. This can be useful when processing data downloaded from the ENA, SRA, or DDBJ (`--shortread_qc_skipadaptertrim`).
Both tools support length filtering of reads and can be tuned with `--shortread_qc_minlength`. Performing length filtering can be useful to remove short (often low sequencing complexity) sequences that result in unspecific classification and therefore slow down runtime during profiling, with minimal gain.
Both tools support length filtering of reads and can be tuned with `--shortread_qc_minlength`. Performing length filtering can be useful to remove short (often low sequencing complexity) sequences that result in unspecific classification and therefore slow down runtime during classification/profiling, with minimal gain.
There is currently one option for long-read Oxford Nanopore processing: [`porechop`](https://github.com/rrwick/Porechop).
@ -177,7 +179,7 @@ For both short-read and long-read preprocessing, you can optionally save the res
Complexity filtering can be activated via the `--perform_shortread_complexityfilter` flag.
Complexity filtering is primarily a run-time optimisation step. It is not necessary for accurate taxonomic profiling, however it can speed up run-time of each tool by removing reads with low-diversity of nucleotides (e.g. with mono-nucleotide - `AAAAAAAA`, or di-nucleotide repeats `GAGAGAGAGAGAGAG`) that have a low-chance of giving an informative taxonomic ID as they can be associated with many different taxa. Removing these reads therefore saves computational time and resources.
Complexity filtering is primarily a run-time optimisation step. It is not necessary for accurate taxonomic classification/profiling, however it can speed up run-time of each tool by removing reads with low-diversity of nucleotides (e.g. with mono-nucleotide - `AAAAAAAA`, or di-nucleotide repeats `GAGAGAGAGAGAGAG`) that have a low-chance of giving an informative taxonomic ID as they can be associated with many different taxa. Removing these reads therefore saves computational time and resources.
There are currently three options for short-read complexity filtering: [`bbduk`](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/), [`prinseq++`](https://github.com/Adrian-Cantu/PRINSEQ-plus-plus), and [`fastp`](https://github.com/OpenGene/fastp#low-complexity-filter).
@ -191,11 +193,11 @@ You can optionally save the FASTQ output of the run merging with the `--save_com
#### Host Removal
Removal of possible-host reads from FASTQ files prior profiling can be activated with `--perform_shortread_hostremoval` or `--perform_longread_hostremoval`.
Removal of possible-host reads from FASTQ files prior classification/profiling can be activated with `--perform_shortread_hostremoval` or `--perform_longread_hostremoval`.
Similarly to complexity filtering, host-removal can be useful for runtime optimisation and reduction in misclassified reads. It is not always necessary to report classification of reads from a host when you already know the host of the sample, therefore you can gain a run-time and computational advantage by removing these prior typically resource-heavy profiling with more efficient methods. Furthermore, particularly with human samples, you can reduce the number of false positives during profiling that occur due to host-sequence contamination in reference genomes on public databases.
Similarly to complexity filtering, host-removal can be useful for runtime optimisation and reduction in misclassified reads. It is not always necessary to report classification of reads from a host when you already know the host of the sample, therefore you can gain a run-time and computational advantage by removing these prior typically resource-heavy classification/profiling with more efficient methods. Furthermore, particularly with human samples, you can reduce the number of false positives during classification/profiling that occur due to host-sequence contamination in reference genomes on public databases.
nf-core/taxprofiler currently offers host-removal via alignment against a reference genome with Bowtie2, and the use of the unaligned reads for downstream profiling.
nf-core/taxprofiler currently offers host-removal via alignment against a reference genome with Bowtie2 for short reads and minimap2 for long reads, and the use of the unaligned reads for downstream classification/profiling.
You can supply your reference genome in FASTA format with `--hostremoval_reference`. You can also optionally supply a directory containing pre-indexed Bowtie2 index files with `--shortread_hostremoval_index` or a minimap2 `.mmi` file for `--longread_hostremoval_index`, however nf-core/taxprofiler will generate these for you if necessary. Pre-supplying the index directory or files can greatly speed up the process, and these can be re-used.
@ -207,28 +209,32 @@ For samples that may have been sequenced over multiple runs, or for FASTQ files
For more information how to set up your input samplesheet, see [Multiple runs of the same sample](#multiple-runs-of-the-same-sample).
Activating this functionality will concatenate the FASTQ files with the same sample name _after_ the optional preprocessing steps and _before_ profiling. Note that libraries with runs of different pairing types will **not** be merged and this will be indicated on output files with a `_se` or `_pe` suffix to the sample name accordingly.
Activating this functionality will concatenate the FASTQ files with the same sample name _after_ the optional preprocessing steps and _before_ classification/profiling. Note that libraries with runs of different pairing types will **not** be merged and this will be indicated on output files with a `_se` or `_pe` suffix to the sample name accordingly.
You can optionally save the FASTQ output of the run merging with the `--save_runmerged_reads`.
#### Profiling
#### Classification and Profiling
The following sections provide tips and suggestions for running the different taxonomic classification and profiling tools _within the pipeline_. For advice and/or guidance whether you should run a particular tool on your specific data, please see the documentation of each tool!
An important distinction between the different tools in included in the pipeline is classification versus profiling. Taxonomic _classification_ is concerned with simply detecting the presence of species in a given sample. Taxonomic _profiling_ involves additionally estimating the _abundance_ of each species.
Note that not all taxonomic classification tools (e.g. Kraken, MALT, Kaiju) performs _profiling_, but all taxonomic profilers (e.g. MetaPhlAn, mOTUs, Bracken) must perform some form of _classification_ prior to profiling.
For advice as to which tool to run in your context, please see the documentation of each tool.
> 🖊️ If you would like to change this behaviour, please contact us on the [nf-core slack](https://nf-co.re/join) and we can discuss this.
Not all tools currently have dedicated tips, suggestions and/or recommendations, however we welcome further contributions for existing and additional tools via pull requests to the [nf-core/taxprofiler repository](https://github.com/nf-core/taxprofiler)!
##### Bracken
You must make sure to also activate Kraken2 to run Bracken in the pipeline.
It is unclear whether Bracken is suitable for running long reads, as it makes certain assumptions about read lengths. Furthemore, during testing we found issues where Bracken would fail on the long-read test data.
Therefore currently nf-core/taxprofiler does not run Bracken on data specified as being sequenced with `OXFORD_NANOPORE` in the input samplesheet.
> 🖊️ If you would like to change this behaviour, please contact us on the [nf-core slack](https://nf-co.re/join) and we can discuss this.
##### Bracken
You must make sure to also activate Kraken2 to run Bracken in the pipeline.
##### Centrifuge
Centrifuge currently does not accept FASTA files as input, therefore no output will be produced for these input files.
@ -503,7 +509,7 @@ The following tutorials assumes you already have the tool available (e.g. instal
#### Bracken custom database
Bracken does not require an independent database construction, but rather builds upon Kraken2 databases. See [Kraken2](#kraken2-custom-database) for more information on how to build these.
Bracken does not require an independent database nor not provide any default databases for classification/profiling, but rather builds upon Kraken2 databases. See [Kraken2](#kraken2-custom-database) for more information on how to build these.
In addition to a Kraken2 database, you also need to have the (average) read lengths (in bp) of your sequencing experiment, the K-mer size used to build the Kraken2 database, and Kraken2 available on your machine.

View file

@ -7,47 +7,47 @@
"nf-core": {
"adapterremoval": {
"branch": "master",
"git_sha": "ce7cf27e377fdacf7ebe8e75903ec70405ea1659",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"bbmap/bbduk": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"bowtie2/align": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"bowtie2/build": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"bracken/bracken": {
"branch": "master",
"git_sha": "8cab56516076b23c6f8eb1ac20ba4ce9692c85e1",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"bracken/combinebrackenoutputs": {
"branch": "master",
"git_sha": "9c87d5fdad182590a370ea43a4ecebd200a6f6fb",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"cat/fastq": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"centrifuge/centrifuge": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"centrifuge/kreport": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"custom/dumpsoftwareversions": {
@ -57,18 +57,18 @@
},
"diamond/blastx": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"falco": {
"branch": "master",
"git_sha": "fc959214036403ad83efe7a41d43d0606c445cda",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"],
"patch": "modules/nf-core/falco/falco.diff"
},
"fastp": {
"branch": "master",
"git_sha": "1e49f31e93c56a3832833eef90a02d3cde5a3f7e",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"fastqc": {
@ -78,42 +78,42 @@
},
"filtlong": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"gunzip": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"kaiju/kaiju": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"kaiju/kaiju2krona": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"kaiju/kaiju2table": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"kraken2/kraken2": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"krakentools/combinekreports": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"krakentools/kreport2krona": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"krakenuniq/preloadedkrakenuniq": {
@ -123,92 +123,92 @@
},
"krona/ktimporttaxonomy": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"krona/ktimporttext": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"malt/run": {
"branch": "master",
"git_sha": "6d9712f03ec2de8264a50ee4541a617e1e063b51",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"megan/rma2info": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"metaphlan3/mergemetaphlantables": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"metaphlan3/metaphlan3": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"minimap2/align": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"minimap2/index": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"motus/merge": {
"branch": "master",
"git_sha": "3fce766123e71e82fb384db7d07b59180baa9ee9",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"motus/profile": {
"branch": "master",
"git_sha": "3fce766123e71e82fb384db7d07b59180baa9ee9",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"multiqc": {
"branch": "master",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"git_sha": "ee80d14721e76e2e079103b8dcd5d57129e584ba",
"installed_by": ["modules"]
},
"porechop/porechop": {
"branch": "master",
"git_sha": "2a4e85eb81875a572bb58133e37f84ba3cc484d7",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"prinseqplusplus": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"samtools/bam2fq": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"samtools/index": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"samtools/stats": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"samtools/view": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
},
"untar": {
"branch": "master",
"git_sha": "5e34754d42cd2d5d248ca8673c0a53cdf5624905",
"git_sha": "c8e35eb2055c099720a75538d1b8adb3fb5a464c",
"installed_by": ["modules"]
}
}

View file

@ -2,7 +2,7 @@ process ADAPTERREMOVAL {
tag "$meta.id"
label 'process_medium'
conda (params.enable_conda ? "bioconda::adapterremoval=2.3.2" : null)
conda "bioconda::adapterremoval=2.3.2"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/adapterremoval:2.3.2--hb7ba0dd_0' :
'quay.io/biocontainers/adapterremoval:2.3.2--hb7ba0dd_0' }"

View file

@ -2,10 +2,10 @@ process BBMAP_BBDUK {
tag "$meta.id"
label 'process_medium'
conda (params.enable_conda ? "bioconda::bbmap=38.90" : null)
conda "bioconda::bbmap=39.01"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bbmap:38.90--he522d1c_1' :
'quay.io/biocontainers/bbmap:38.90--he522d1c_1' }"
'https://depot.galaxyproject.org/singularity/bbmap:39.01--h5c4e2a8_0':
'quay.io/biocontainers/bbmap:39.01--h5c4e2a8_0' }"
input:
tuple val(meta), path(reads)
@ -37,7 +37,7 @@ process BBMAP_BBDUK {
&> ${prefix}.bbduk.log
cat <<-END_VERSIONS > versions.yml
"${task.process}":
bbmap: \$(bbversion.sh)
bbmap: \$(bbversion.sh | grep -v "Duplicate cpuset")
END_VERSIONS
"""
}

View file

@ -2,14 +2,14 @@ process BOWTIE2_ALIGN {
tag "$meta.id"
label "process_high"
conda (params.enable_conda ? "bioconda::bowtie2=2.4.4 bioconda::samtools=1.15.1 conda-forge::pigz=2.6" : null)
container "${ workflow.containerEngine == "singularity" && !task.ext.singularity_pull_docker_container ?
"https://depot.galaxyproject.org/singularity/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:1744f68fe955578c63054b55309e05b41c37a80d-0" :
"quay.io/biocontainers/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:1744f68fe955578c63054b55309e05b41c37a80d-0" }"
conda "bioconda::bowtie2=2.4.4 bioconda::samtools=1.16.1 conda-forge::pigz=2.6"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:a0ffedb52808e102887f6ce600d092675bf3528a-0' :
'quay.io/biocontainers/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:a0ffedb52808e102887f6ce600d092675bf3528a-0' }"
input:
tuple val(meta), path(reads)
path index
tuple val(meta) , path(reads)
tuple val(meta2), path(index)
val save_unaligned
val sort_bam
@ -40,8 +40,8 @@ process BOWTIE2_ALIGN {
def samtools_command = sort_bam ? 'sort' : 'view'
"""
INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/.rev.1.bt2//"`
[ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/.rev.1.bt2l//"`
INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\\.rev.1.bt2\$//"`
[ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\\.rev.1.bt2l\$//"`
[ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1
bowtie2 \\

View file

@ -27,6 +27,11 @@ input:
description: |
List of input FastQ files of size 1 and 2 for single-end and paired-end data,
respectively.
- meta2:
type: map
description: |
Groovy Map containing reference information
e.g. [ id:'test', single_end:false ]
- index:
type: file
description: Bowtie2 genome index files

View file

@ -2,17 +2,17 @@ process BOWTIE2_BUILD {
tag "$fasta"
label 'process_high'
conda (params.enable_conda ? 'bioconda::bowtie2=2.4.4' : null)
conda "bioconda::bowtie2=2.4.4"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bowtie2:2.4.4--py39hbb4e92a_0' :
'quay.io/biocontainers/bowtie2:2.4.4--py39hbb4e92a_0' }"
input:
path fasta
tuple val(meta), path(fasta)
output:
path 'bowtie2' , emit: index
path "versions.yml" , emit: versions
tuple val(meta), path('bowtie2') , emit: index
path "versions.yml" , emit: versions
when:
task.ext.when == null || task.ext.when

View file

@ -16,10 +16,20 @@ tools:
doi: 10.1038/nmeth.1923
licence: ["GPL-3.0-or-later"]
input:
- meta:
type: map
description: |
Groovy Map containing reference information
e.g. [ id:'test', single_end:false ]
- fasta:
type: file
description: Input genome fasta file
output:
- meta:
type: map
description: |
Groovy Map containing reference information
e.g. [ id:'test', single_end:false ]
- index:
type: file
description: Bowtie2 genome index files

View file

@ -4,7 +4,7 @@ process BRACKEN_BRACKEN {
// WARN: Version information not provided by tool on CLI.
// Please update version string below when bumping container versions.
conda (params.enable_conda ? "bioconda::bracken=2.7" : null)
conda "bioconda::bracken=2.7"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bracken:2.7--py39hc16433a_0':
'quay.io/biocontainers/bracken:2.7--py39hc16433a_0' }"

View file

@ -2,7 +2,7 @@ process BRACKEN_COMBINEBRACKENOUTPUTS {
tag "$meta.id"
label 'process_low'
conda (params.enable_conda ? "bioconda::bracken=2.7" : null)
conda "bioconda::bracken=2.7"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bracken:2.7--py39hc16433a_0':
'quay.io/biocontainers/bracken:2.7--py39hc16433a_0' }"

View file

@ -2,7 +2,7 @@ process CAT_FASTQ {
tag "$meta.id"
label 'process_single'
conda (params.enable_conda ? "conda-forge::sed=4.7" : null)
conda "conda-forge::sed=4.7"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/ubuntu:20.04' :
'ubuntu:20.04' }"

View file

@ -2,7 +2,7 @@ process CENTRIFUGE_CENTRIFUGE {
tag "$meta.id"
label 'process_high'
conda (params.enable_conda ? "bioconda::centrifuge=1.0.4_beta" : null)
conda "bioconda::centrifuge=1.0.4_beta"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/centrifuge:1.0.4_beta--h9a82719_6' :
'quay.io/biocontainers/centrifuge:1.0.4_beta--h9a82719_6' }"
@ -41,7 +41,7 @@ process CENTRIFUGE_CENTRIFUGE {
def sam_output = sam_format ? "--out-fmt 'sam'" : ''
"""
## we add "-no-name ._" to ensure silly Mac OSX metafiles files aren't included
db_name=`find -L ${db} -name "*.1.cf" -not -name "._*" | sed 's/.1.cf//'`
db_name=`find -L ${db} -name "*.1.cf" -not -name "._*" | sed 's/\\.1.cf\$//'`
centrifuge \\
-x \$db_name \\
-p $task.cpus \\

View file

@ -2,7 +2,7 @@ process CENTRIFUGE_KREPORT {
tag "$meta.id"
label 'process_single'
conda (params.enable_conda ? "bioconda::centrifuge=1.0.4_beta" : null)
conda "bioconda::centrifuge=1.0.4_beta"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/centrifuge:1.0.4_beta--h9a82719_6':
'quay.io/biocontainers/centrifuge:1.0.4_beta--h9a82719_6' }"
@ -22,7 +22,7 @@ process CENTRIFUGE_KREPORT {
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
db_name=`find -L ${db} -name "*.1.cf" -not -name "._*" | sed 's/.1.cf//'`
db_name=`find -L ${db} -name "*.1.cf" -not -name "._*" | sed 's/\\.1.cf\$//'`
centrifuge-kreport -x \$db_name ${report} > ${prefix}.txt
cat <<-END_VERSIONS > versions.yml

View file

@ -2,7 +2,7 @@ process DIAMOND_BLASTX {
tag "$meta.id"
label 'process_medium'
conda (params.enable_conda ? "bioconda::diamond=2.0.15" : null)
conda "bioconda::diamond=2.0.15"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/diamond:2.0.15--hb97b32f_0' :
'quay.io/biocontainers/diamond:2.0.15--hb97b32f_0' }"
@ -46,7 +46,7 @@ process DIAMOND_BLASTX {
break
}
"""
DB=`find -L ./ -name "*.dmnd" | sed 's/.dmnd//'`
DB=`find -L ./ -name "*.dmnd" | sed 's/\\.dmnd\$//'`
diamond \\
blastx \\

View file

@ -3,7 +3,7 @@ process FALCO {
label 'process_single'
conda (params.enable_conda ? "bioconda::falco=1.2.1" : null)
conda "bioconda::falco=1.2.1"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/falco:1.2.1--h867801b_3':
'quay.io/biocontainers/falco:1.2.1--h867801b_3' }"

View file

@ -2,7 +2,7 @@ process FASTP {
tag "$meta.id"
label 'process_medium'
conda (params.enable_conda ? 'bioconda::fastp=0.23.2' : null)
conda "bioconda::fastp=0.23.2"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/fastp:0.23.2--h79da9fb_0' :
'quay.io/biocontainers/fastp:0.23.2--h79da9fb_0' }"

View file

@ -2,7 +2,7 @@ process FILTLONG {
tag "$meta.id"
label 'process_low'
conda (params.enable_conda ? "bioconda::filtlong=0.2.1" : null)
conda "bioconda::filtlong=0.2.1"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/filtlong:0.2.1--h9a82719_0' :
'quay.io/biocontainers/filtlong:0.2.1--h9a82719_0' }"

View file

@ -2,7 +2,7 @@ process GUNZIP {
tag "$archive"
label 'process_single'
conda (params.enable_conda ? "conda-forge::sed=4.7" : null)
conda "conda-forge::sed=4.7"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/ubuntu:20.04' :
'ubuntu:20.04' }"

View file

@ -2,7 +2,7 @@ process KAIJU_KAIJU {
tag "$meta.id"
label 'process_high'
conda (params.enable_conda ? "bioconda::kaiju=1.8.2" : null)
conda "bioconda::kaiju=1.8.2"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/kaiju:1.8.2--h5b5514e_1':
'quay.io/biocontainers/kaiju:1.8.2--h5b5514e_1' }"

View file

@ -2,7 +2,7 @@ process KAIJU_KAIJU2KRONA {
tag "$meta.id"
label 'process_single'
conda (params.enable_conda ? "bioconda::kaiju=1.8.2" : null)
conda "bioconda::kaiju=1.8.2"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/kaiju:1.8.2--h5b5514e_1':
'quay.io/biocontainers/kaiju:1.8.2--h5b5514e_1' }"

View file

@ -2,7 +2,7 @@ process KAIJU_KAIJU2TABLE {
tag "$meta.id"
label 'process_single'
conda (params.enable_conda ? "bioconda::kaiju=1.8.2" : null)
conda "bioconda::kaiju=1.8.2"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/kaiju:1.8.2--h5b5514e_1':
'quay.io/biocontainers/kaiju:1.8.2--h2e03b76_0' }"

View file

@ -2,7 +2,7 @@ process KRAKEN2_KRAKEN2 {
tag "$meta.id"
label 'process_high'
conda (params.enable_conda ? 'bioconda::kraken2=2.1.2 conda-forge::pigz=2.6' : null)
conda "bioconda::kraken2=2.1.2 conda-forge::pigz=2.6"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:87fc08d11968d081f3e8a37131c1f1f6715b6542-0' :
'quay.io/biocontainers/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:87fc08d11968d081f3e8a37131c1f1f6715b6542-0' }"

View file

@ -1,7 +1,7 @@
process KRAKENTOOLS_COMBINEKREPORTS {
label 'process_single'
conda (params.enable_conda ? "bioconda::krakentools=1.2" : null)
conda "bioconda::krakentools=1.2"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/krakentools:1.2--pyh5e36f6f_0':
'quay.io/biocontainers/krakentools:1.2--pyh5e36f6f_0' }"

View file

@ -3,7 +3,7 @@ process KRAKENTOOLS_KREPORT2KRONA {
label 'process_single'
// WARN: Version information not provided by tool on CLI. Please update version string below when bumping container versions.
conda (params.enable_conda ? "bioconda::krakentools=1.2" : null)
conda "bioconda::krakentools=1.2"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/krakentools:1.2--pyh5e36f6f_0':
'quay.io/biocontainers/krakentools:1.2--pyh5e36f6f_0' }"

View file

@ -3,7 +3,7 @@ process KRONA_KTIMPORTTAXONOMY {
label 'process_single'
// WARN: Version information not provided by tool on CLI. Please update version string below when bumping container versions.
conda (params.enable_conda ? "bioconda::krona=2.8" : null)
conda "bioconda::krona=2.8"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/krona:2.8--pl5262hdfd78af_2' :
'quay.io/biocontainers/krona:2.8--pl5262hdfd78af_2' }"

View file

@ -2,7 +2,7 @@ process KRONA_KTIMPORTTEXT {
tag "$meta.id"
label 'process_single'
conda (params.enable_conda ? "bioconda::krona=2.8.1" : null)
conda "bioconda::krona=2.8.1"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/krona:2.8.1--pl5321hdfd78af_1':
'quay.io/biocontainers/krona:2.8.1--pl5321hdfd78af_1' }"

View file

@ -2,7 +2,7 @@ process MALT_RUN {
tag "$meta.id"
label 'process_high'
conda (params.enable_conda ? "bioconda::malt=0.61" : null)
conda "bioconda::malt=0.61"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/malt:0.61--hdfd78af_0' :
'quay.io/biocontainers/malt:0.61--hdfd78af_0' }"

View file

@ -2,7 +2,7 @@ process MEGAN_RMA2INFO {
tag "$meta.id"
label 'process_single'
conda (params.enable_conda ? "bioconda::megan=6.21.7" : null)
conda "bioconda::megan=6.21.7"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/megan:6.21.7--h9ee0642_0':
'quay.io/biocontainers/megan:6.21.7--h9ee0642_0' }"

View file

@ -1,7 +1,7 @@
process METAPHLAN3_MERGEMETAPHLANTABLES {
label 'process_single'
conda (params.enable_conda ? 'bioconda::metaphlan=3.0.12' : null)
conda "bioconda::metaphlan=3.0.12"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/metaphlan:3.0.12--pyhb7b1952_0' :
'quay.io/biocontainers/metaphlan:3.0.12--pyhb7b1952_0' }"

View file

@ -2,7 +2,7 @@ process METAPHLAN3_METAPHLAN3 {
tag "$meta.id"
label 'process_high'
conda (params.enable_conda ? 'bioconda::metaphlan=3.0.12' : null)
conda "bioconda::metaphlan=3.0.12"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/metaphlan:3.0.12--pyhb7b1952_0' :
'quay.io/biocontainers/metaphlan:3.0.12--pyhb7b1952_0' }"

View file

@ -2,7 +2,8 @@ process MINIMAP2_ALIGN {
tag "$meta.id"
label 'process_medium'
conda (params.enable_conda ? 'bioconda::minimap2=2.21 bioconda::samtools=1.12' : null)
// Note: the versions here need to match the versions used in the mulled container below and minimap2/index
conda "bioconda::minimap2=2.24 bioconda::samtools=1.14"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-66534bcbb7031a148b13e2ad42583020b9cd25c4:1679e915ddb9d6b4abda91880c4b48857d471bd8-0' :
'quay.io/biocontainers/mulled-v2-66534bcbb7031a148b13e2ad42583020b9cd25c4:1679e915ddb9d6b4abda91880c4b48857d471bd8-0' }"
@ -25,7 +26,6 @@ process MINIMAP2_ALIGN {
script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def input_reads = meta.single_end ? "$reads" : "${reads[0]} ${reads[1]}"
def bam_output = bam_format ? "-a | samtools sort | samtools view -@ ${task.cpus} -b -h -o ${prefix}.bam" : "-o ${prefix}.paf"
def cigar_paf = cigar_paf_format && !bam_format ? "-c" : ''
def set_cigar_bam = cigar_bam && bam_format ? "-L" : ''
@ -33,8 +33,8 @@ process MINIMAP2_ALIGN {
minimap2 \\
$args \\
-t $task.cpus \\
$reference \\
$input_reads \\
"${reference ?: reads}" \\
"$reads" \\
$cigar_paf \\
$set_cigar_bam \\
$bam_output

View file

@ -1,10 +1,11 @@
process MINIMAP2_INDEX {
label 'process_medium'
conda (params.enable_conda ? 'bioconda::minimap2=2.21' : null)
// Note: the versions here need to match the versions used in minimap2/align
conda "bioconda::minimap2=2.24"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/minimap2:2.21--h5bf99c6_0' :
'quay.io/biocontainers/minimap2:2.21--h5bf99c6_0' }"
'https://depot.galaxyproject.org/singularity/minimap2:2.24--h7132678_1' :
'quay.io/biocontainers/minimap2:2.24--h7132678_1' }"
input:
tuple val(meta), path(fasta)

View file

@ -27,7 +27,7 @@ output:
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- mmi:
- index:
type: file
description: Minimap2 fasta index.
pattern: "*.mmi"

View file

@ -2,7 +2,7 @@ process MOTUS_MERGE {
tag "$meta.id"
label 'process_single'
conda (params.enable_conda ? "bioconda::motus=3.0.3" : null)
conda "bioconda::motus=3.0.3"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/motus:3.0.3--pyhdfd78af_0':
'quay.io/biocontainers/motus:3.0.3--pyhdfd78af_0' }"

View file

@ -2,7 +2,7 @@ process MOTUS_PROFILE {
tag "$meta.id"
label 'process_medium'
conda (params.enable_conda ? "bioconda::motus=3.0.3" : null)
conda "bioconda::motus=3.0.3"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/motus:3.0.3--pyhdfd78af_0':
'quay.io/biocontainers/motus:3.0.3--pyhdfd78af_0' }"

View file

@ -1,10 +1,10 @@
process MULTIQC {
label 'process_single'
conda "bioconda::multiqc=1.13"
conda "bioconda::multiqc=1.14"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/multiqc:1.13--pyhdfd78af_0' :
'quay.io/biocontainers/multiqc:1.13--pyhdfd78af_0' }"
'https://depot.galaxyproject.org/singularity/multiqc:1.14--pyhdfd78af_0' :
'quay.io/biocontainers/multiqc:1.14--pyhdfd78af_0' }"
input:
path multiqc_files, stageAs: "?/*"

View file

@ -2,7 +2,7 @@ process PORECHOP_PORECHOP {
tag "$meta.id"
label 'process_medium'
conda (params.enable_conda ? "bioconda::porechop=0.2.4" : null)
conda "bioconda::porechop=0.2.4"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/porechop:0.2.4--py39h7cff6ad_2' :
'quay.io/biocontainers/porechop:0.2.4--py39h7cff6ad_2' }"

View file

@ -2,7 +2,7 @@ process PRINSEQPLUSPLUS {
tag "$meta.id"
label 'process_low'
conda (params.enable_conda ? "bioconda::prinseq-plus-plus=1.2.3" : null)
conda "bioconda::prinseq-plus-plus=1.2.3"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/prinseq-plus-plus:1.2.3--hc90279e_1':
'quay.io/biocontainers/prinseq-plus-plus:1.2.3--hc90279e_1' }"

View file

@ -2,10 +2,10 @@ process SAMTOOLS_BAM2FQ {
tag "$meta.id"
label 'process_low'
conda (params.enable_conda ? "bioconda::samtools=1.15.1" : null)
conda "bioconda::samtools=1.16.1"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/samtools:1.15.1--h1170115_0' :
'quay.io/biocontainers/samtools:1.15.1--h1170115_0' }"
'https://depot.galaxyproject.org/singularity/samtools:1.16.1--h6899075_1' :
'quay.io/biocontainers/samtools:1.16.1--h6899075_1' }"
input:
tuple val(meta), path(inputbam)

View file

@ -2,10 +2,10 @@ process SAMTOOLS_INDEX {
tag "$meta.id"
label 'process_low'
conda (params.enable_conda ? "bioconda::samtools=1.15.1" : null)
conda "bioconda::samtools=1.16.1"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/samtools:1.15.1--h1170115_0' :
'quay.io/biocontainers/samtools:1.15.1--h1170115_0' }"
'https://depot.galaxyproject.org/singularity/samtools:1.16.1--h6899075_1' :
'quay.io/biocontainers/samtools:1.16.1--h6899075_1' }"
input:
tuple val(meta), path(input)

View file

@ -2,10 +2,10 @@ process SAMTOOLS_STATS {
tag "$meta.id"
label 'process_single'
conda (params.enable_conda ? "bioconda::samtools=1.15.1" : null)
conda "bioconda::samtools=1.16.1"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/samtools:1.15.1--h1170115_0' :
'quay.io/biocontainers/samtools:1.15.1--h1170115_0' }"
'https://depot.galaxyproject.org/singularity/samtools:1.16.1--h6899075_1' :
'quay.io/biocontainers/samtools:1.16.1--h6899075_1' }"
input:
tuple val(meta), path(input), path(input_index)

View file

@ -2,10 +2,10 @@ process SAMTOOLS_VIEW {
tag "$meta.id"
label 'process_low'
conda (params.enable_conda ? "bioconda::samtools=1.15.1" : null)
conda "bioconda::samtools=1.16.1"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/samtools:1.15.1--h1170115_0' :
'quay.io/biocontainers/samtools:1.15.1--h1170115_0' }"
'https://depot.galaxyproject.org/singularity/samtools:1.16.1--h6899075_1' :
'quay.io/biocontainers/samtools:1.16.1--h6899075_1' }"
input:
tuple val(meta), path(input), path(index)
@ -26,6 +26,7 @@ process SAMTOOLS_VIEW {
script:
def args = task.ext.args ?: ''
def args2 = task.ext.args2 ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def reference = fasta ? "--reference ${fasta}" : ""
def readnames = qname ? "--qname-file ${qname}": ""
@ -42,7 +43,8 @@ process SAMTOOLS_VIEW {
${readnames} \\
$args \\
-o ${prefix}.${file_type} \\
$input
$input \\
$args2
cat <<-END_VERSIONS > versions.yml
"${task.process}":

View file

@ -2,7 +2,7 @@ process UNTAR {
tag "$archive"
label 'process_single'
conda (params.enable_conda ? "conda-forge::sed=4.7" : null)
conda "conda-forge::sed=4.7"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/ubuntu:20.04' :
'ubuntu:20.04' }"

View file

@ -19,7 +19,7 @@ workflow SHORTREAD_HOSTREMOVAL {
ch_multiqc_files = Channel.empty()
if ( !params.shortread_hostremoval_index ) {
ch_bowtie2_index = BOWTIE2_BUILD ( reference ).index
ch_bowtie2_index = BOWTIE2_BUILD ( [ [], reference ] ).index
ch_versions = ch_versions.mix( BOWTIE2_BUILD.out.versions )
} else {
ch_bowtie2_index = index.first()