## Introduction
**nf-core/taxprofiler** is a bioinformatics best-practice analysis pipeline for taxonomic classification and profiling of shotgun and long-read metagenomic data. It allows for in-parallel taxonomic identification of reads or taxonomic abundance estimation with multiple classification and profiling tools against multiple databases, produces standardised output tables.
## Pipeline summary
1. Read QC ([`FastQC`]( or [`falco`]( as an alternative option)
2. Performs optional read pre-processing
- Adapter clipping and merging (short-read: [fastp](, [AdapterRemoval2](; long-read: [porechop](
- Low complexity and quality filtering (short-read: [bbduk](, [PRINSEQ++](; long-read: [Filtlong](
- Host-read removal (short-read: [BowTie2](; long-read: [Minimap2](
- Run merging
3. Supports statistics for host-read removal ([Samtools](
4. Performs taxonomic classification and/or profiling using one or more of:
- [Kraken2](
- [MetaPhlAn3](
- [MALT](
- [Centrifuge](
- [Kaiju](
- [mOTUs](
- [KrakenUniq](
5. Perform optional post-processing with:
- [bracken](
6. Standardises output tables ([`Taxpasta`](
7. Present QC for raw reads ([`MultiQC`](
8. Plotting Kraken2, Centrifuge, Kaiju and MALT results ([`Krona`](
## Usage
> **Note**
> If you are new to Nextflow and nf-core, please refer to [this page]( on how
> to set-up Nextflow. Make sure to [test your setup](
> with `-profile test` before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
Each row represents a fastq file (single-end), a pair of fastq files (paired end), or a fasta (with long reads).
Additionally, you will need a database sheet that looks as follows:
That includes directories or `.tar.gz` archives containing databases for the tools you wish to run the pipeline against.
Now, you can run the pipeline using:
nextflow run nf-core/taxprofiler \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--databases databases.csv \
--outdir <OUTDIR> \
--run_kraken2 --run_metaphlan3
> **Warning:**
> Please provide pipeline parameters via the CLI (as above) or Nextflow `-params-file` option. Custom config files including those
> provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](
For more details, please refer to the [usage documentation]( and the [parameter documentation](
## Pipeline output
To see the results of a test run with a full size dataset refer to the [results]( tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
[output documentation](
## Credits
nf-core/taxprofiler was originally written by James A. Fellows Yates, Sofia Stamouli, Moritz E. Beber, and the nf-core/taxprofiler team.
### Team
- [James A. Fellows Yates](
- [Sofia Stamouli](
- [Moritz E. Beber](
We thank the following people for their contributions to the development of this pipeline:
- [Lauri Mesilaakso](
- [Tanja Normark](
- [Maxime Borry](
- [Thomas A. Christensen II](
- [Jianhong Ou](
- [Rafal Stepien](
- [Mahwash Jamy](
### Acknowledgments
We also are grateful for the feedback and comments from:
- The general [nf-core/community](
And specifically to
- [Alex Hübner](
- [Lily Andersson Lee](
❤️ also goes to [Zandra Fagernäs]( for the logo.
## Contributions and Support
If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/
For further information or help, don't hesitate to get in touch on the [Slack `#taxprofiler` channel]( (you can join with [this invite](
## Citations
If you use nf-core/taxprofiler for your analysis, please cite it using the following doi: [10.5281/zenodo.7728364](
An extensive list of references for the tools used by the pipeline can be found in the [``]( file.
You can cite the `nf-core` publication as follows:
> **The nf-core framework for community-curated bioinformatics pipelines.**
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](