diff --git a/docs/output.md b/docs/output.md index 3257860..17d37b5 100644 --- a/docs/output.md +++ b/docs/output.md @@ -29,7 +29,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d - [Centrifuge](#centrifuge) - Taxonomic classifier that uses a novel indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index. - [Kaiju](#kaiju) - Taxonomic classifier that finds maximum (in-)exact matches on the protein-level. - [Diamond](#diamond) - Sequence aligner for protein and translated DNA searches. -- [MALT](#malt) - Sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics +- [MALT](#malt) - Sequence alignment and analysis tool designed for processing high-throughput sequencing data, especially in the context of metagenomics - [MetaPhlAn3](#metaphlan3) - Genome-level marker gene based taxonomic classifier - [mOTUs](#motus) - Tool for marker gene-based OTU (mOTU) profiling. - [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline @@ -80,10 +80,10 @@ It is used in nf-core/taxprofiler for adapter trimming of short-reads. - `adapterremoval/` - `.settings`: AdapterRemoval log file containing general adapter removal, read trimming and merging statistics - - `.collapsed.fastq.gz` - read-pairs that merged and did not undergo trimming (only when `--shortread_qc_mergepairs` supplied) - - `.collapsed.truncated.fastq.gz` - read-pairs that merged underwent quality trimming (only when `--shortread_qc_mergepairs` supplied) - - `.pair1.truncated.fastq.gz` - read 1 of pairs that underwent quality trimming - - `.pair2.truncated.fastq.gz` - read 2 of pairs that underwent quality trimming (and could not merge if `--shortread_qc_mergepairs` supplied) + - `.collapsed.fastq.gz` - read-pairs that merged and did not undergo trimming (only when `--shortread_qc_mergepairs` supplied) + - `.collapsed.truncated.fastq.gz` - read-pairs that merged underwent quality trimming (only when `--shortread_qc_mergepairs` supplied) + - `.pair1.truncated.fastq.gz` - read 1 of pairs that underwent quality trimming + - `.pair2.truncated.fastq.gz` - read 2 of pairs that underwent quality trimming (and could not merge if `--shortread_qc_mergepairs` supplied) - `.singleton.truncated.fastq.gz` - orphaned read pairs where one of the pair was discarded - `.discard.fastq.gz` - reads that were discarded due to length or quality filtering @@ -133,7 +133,7 @@ It is used in nf-core/taxprofiler for complexity filtering using different algor - `prinseqplusplus/` - `.log`: log file containing number of reads. Row IDs correspond to: `min_len, max_len, min_gc, max_gc, min_qual_score, min_qual_mean, ns_max_n, noiupac, derep, lc_entropy, lc_dust, trim_tail_left, trim_tail_right, trim_qual_left, trim_qual_right, trim_left, trim_right` - - `_good_out.fastq.gz`: resulting FASTQ file without low-complexity reads + - `_good_out.fastq.gz`: resulting FASTQ file without low-complexity reads @@ -170,7 +170,7 @@ It is used with nf-core/taxprofiler to allow removal of 'host' (e.g. human) or o -By default nf-core/taxprofiler will only provide the `.log` file if host removal is turned on. You will only see the mapped (host) reads BAM file or the off-target reads in FASTQ format in your results directory if you provide `--save_hostremoval_mapped` and ` --save_hostremoval_unmapped` respectively. +By default nf-core/taxprofiler will only provide the `.log` file if host removal is turned on. You will only see the mapped (host) reads BAM file or the off-target reads in FASTQ format in your results directory if you provide `--save_hostremoval_mapped` and ` --save_hostremoval_unmapped` respectively. Note that the FASTQ file(s) may _not_ always be the 'final' reads that go into taxprofiling, if you also run other steps such as host removal, run merging etc.. @@ -235,7 +235,7 @@ The main taxonomic profiling file from Bracken is the `*.tsv` file. This provide -The main taxonomic profiling file from Kraken2 is the `_combined_reports.txt` or `*report.txt` file. The former provides you the broadest over view of the taxonomic profiling results across all samples against a single databse, where you get two columns for each sample e.g. `2_all` and `2_lvl`, as well as a summarised column summing up across all samples `tot_all` and `tot_lvl`. The latter gives you the most information for a single sample. The report file is also used for the taxpasta step. +The main taxonomic profiling file from Kraken2 is the `_combined_reports.txt` or `*report.txt` file. The former provides you the broadest over view of the taxonomic profiling results across all samples against a single databse, where you get two columns for each sample e.g. `2_all` and `2_lvl`, as well as a summarised column summing up across all samples `tot_all` and `tot_lvl`. The latter gives you the most information for a single sample. The report file is also used for the taxpasta step. You will only recieve the FASTQs and `*classifiedreads.txt` file if you supply `--kraken2_save_reads` and/or `--kraken2_save_readclassification` parameters to the pipeline. @@ -261,7 +261,6 @@ The main taxonomic profiling file from KrakenUniq is the `*report.txt` file. Thi You will only receive the FASTQs and `*classifiedreads.txt` file if you supply `--krakenuniq_save_reads` and/or `--krakenuniq_save_readclassification` parameters to the pipeline. - ### Centrifuge [Centrifuge](https://github.com/DaehwanKimLab/centrifuge) is a taxonomic sequence classifier that uses a Burrows-Wheeler transform and Ferragina-Manzina index for storing and mapping sequences. @@ -312,7 +311,7 @@ You will only receive the FASTQs and `*classifiedreads.txt` file if you supply ` - `malt/` - `/` - `.blastn.sam`: sparse SAM file containing alignments of each hit - - `.megan`: summary file that can be loaded into the [MEGAN6](https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/megan6/) interactive viewer. Generated by MEGAN6 companion tool `rma2info` + - `.megan`: summary file that can be loaded into the [MEGAN6](https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/megan6/) interactive viewer. Generated by MEGAN6 companion tool `rma2info` - `.rma6`: binary file containing all alignments and taxonomic information of hits that can be loaded into the [MEGAN6](https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/algorithms-in-bioinformatics/software/megan6/) interactive viewer - `.txt.gz`: text file containing taxonomic IDs and read counts against each taxon. Generated by MEGAN6 companion tool `rma2info` @@ -322,7 +321,6 @@ The main output of MALT is the `.rma6` file format, which can be only loaded int You will only recieve the `.sam` and `.megan` files if you supply `--malt_save_reads` and/or `--malt_generate_megansummary` parameters to the pipeline. - ### MetaPhlAn3 [MetaPhlAn3](https://github.com/biobakery/metaphlan) is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea and Eukaryotes) from metagenomic shotgun sequencing data (i.e. not 16S) with species-level resolution via marker genes. @@ -334,12 +332,13 @@ You will only recieve the `.sam` and `.megan` files if you supply `--malt_save_r - `metaphlan3__combined_reports.txt`: A combined profile of all samples aligned to a given database (as generated by `metaphlan_merge_tables`) - `/` - `.biom`: taxonomic profile in BIOM format - - `.bowtie2out.txt`: BowTie2 alignment information (can be re-used for skipping alignment when re-running MetaPhlAn3 with different parameters) + - `.bowtie2out.txt`: BowTie2 alignment information (can be re-used for skipping alignment when re-running MetaPhlAn3 with different parameters) - `_profile.txt`: MetaPhlAn3 taxonomic profile including abundance estimates The main taxonomic profiling file from MetaPhlAn3 is the `*_profile.txt` file. This provides the abundance estimates from MetaPhlAn3 however does not include raw counts by default. + ### mOTUs [mOTUS](https://github.com/motu-tool/mOTUs) maps reads to a unique marker specific database and estimates the relative abundance of known and unknown species. diff --git a/docs/usage.md b/docs/usage.md index 0edd065..ccf2d85 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -631,7 +631,6 @@ krakenuniq-build --db --kmer-len 31 Please see the [KrakenUniq documentation](https://github.com/fbreitwieser/krakenuniq#database-building) for more information. - #### MALT custom database MALT does not provide any default databases for profiling, therefore you must build your own. @@ -640,7 +639,7 @@ In addition to the input directory, output directory, and the mapping file datab ```bash malt-build -i ///*.{fna,fa,fasta} -a2t //.db -d / -s DNA -```` +``` You can then add the / path to your nf-core/taxprofiler database input sheet.