Apply review suggestion

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
2024-11-22 11:29:54 +00:00 · 2022-12-20 17:32:13 +01:00 · 2022-12-20 17:32:13 +01:00 · 3c33ba66ca
commit 3c33ba66ca
parent af8fd18d97
1 changed files with 13 additions and 5 deletions
--- a/docs/output.md
+++ b/docs/output.md
@ -188,17 +188,25 @@ Note that the FASTQ file(s) may _not_ always be the 'final' reads that go into t

 ### Kraken2

+[Kraken](https://ccb.jhu.edu/software/kraken2/) is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps -mers to the lowest common ancestor (LCA) of all genomes known to contain a given k-mer.
+
 <details markdown="1">
 <summary>Output files</summary>

- `kraken2`
-  - `<sample_id>.classified.fastq.gz`
-  - `<sample_id>.unclassified.fastq.gz`
-  - `<sample_id>.report.txt`
-  - `<sample_id>.classifiedreads.txt`
+- `kraken2/`
+  - `<db_name>_combined_reports.txt`: A combined profile of all samples aligned to a given database (as generated by `krakentools`)
+  - <db_name>/
+    - `<sample_id>_<db_name>.classified.fastq.gz`: FASTQ file containing all reads that had a hit against a reference in the database for a given sample
+    - `<sample_id>_<db_name>.unclassified.fastq.gz`: FASTQ file containing all reads that did not have a hit in the database for a given sample
+    - `<sample_id>_<db_name>.report.txt`: A Kraken2 report that summarises the fraction abundance, taxonomic ID, number of Kmers, taxonomic path of all the hits in the Kraken2 run for a given sample
+    - `<sample_id>_<db_name>.classifiedreads.txt`: A list of read IDs and the hits each read had against each database for a given sample

 </details>

+The main taxonomic profiling file from Kraken2 is the `_combined_reports.txt` or `*report.txt` file. The former provides you the broadest over view of the taxonomic profiling results across all samples against a single databse, where you get two columns for each sample e.g. `2_all` and `2_lvl`, as well as a summarised column summing up across all samples `tot_all` and `tot_lvl`. The latter gives you the most information for a single sample.  The report file is also used for the taxpasta step.
+
+You will only recieve the FASTQs and `*classifiedreads.txt` file if you supply `--kraken2_save_reads` and/or `--kraken2_save_readclassification` parameters to the pipeline.
+
 ### KrakenUniq

 <details markdown="1">