Standardised taxon table and mOTU database docs improvement

2024-11-24 16:09:55 +00:00 · 2023-03-17 21:03:54 +01:00 · 2023-03-17 21:03:54 +01:00 · b0939c3ae9
commit b0939c3ae9
parent af61e007b3
3 changed files with 15 additions and 6 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -9,6 +9,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ### `Fixed`

+- [#271](https://github.com/nf-core/taxprofiler/pull/271/files) Improved standardised table generation documentation nd mOTUs manual database download tutorial (♥ to @prototaxites for reporting, fix by @jfy133)
+
 ### `Dependencies`

 ### `Deprecated`
--- a/docs/usage.md
+++ b/docs/usage.md
@ -294,7 +294,9 @@ nf-core/taxprofiler supports generation of Krona interactive pie chart plots for

 ##### Multi-Table Generation

-In addition to per-sample profiles, the pipeline also supports generation of 'native' multi-sample taxonomic profiles (i.e., those generated by the taxonomic profiling tools themselves or additional utility scripts provided by the tool authors).
+The main multiple-sample table from nf-core/taxprofiler is from a dedicated standalone tool originally developed for the pipeline - [Taxpasta](https://taxpasta.readthedocs.io/en/latest/). When providing `--run_profile_standardisation`, every classifier/profiler and database combination will get a standardised and multi-sample taxon table in the [`taxpasta/`](https://nf-co.re/taxprofiler/output) directory. These tables are structured in the same way, to facilitate comparison between the the results of the classifier/profiler
+
+In addition to per-sample profiles and standardised Taxpasta output, the pipeline also supports generation of 'native' multi-sample taxonomic profiles (i.e., those generated by the taxonomic profiling tools themselves or additional utility scripts provided by the tool authors), when providing `--run_profile_standardisation` to your pipeline.

 These are executed on a per-database level. I.e., you will get a multi-sample taxon table for each database you provide for each tool and will be placed in the same directory as the directories containing the per-sample profiles.

@ -307,7 +309,7 @@ The following tools will produce multi-sample taxon tables:
 - **MetaPhlAn3** (via MetaPhlAn's `merge_metaphlan_tables.py` script)
 - **mOTUs** (via the `motus merge` command)

-Note that the multi-sample tables from these folders are not inter-operable with each other as they can have different formats.
+Note that the multi-sample tables from the 'native' tools in each folders are [not inter-operable](https://taxpasta.readthedocs.io/en/latest/tutorials/getting-started/) with each other as they can have different formats and can contain additional and different data. In this case we refer you to use the standardised and merged output from Taxpasta, as described above.

 ### Updating the pipeline

@ -792,6 +794,8 @@ More information on the MetaPhlAn3 database can be found [here](https://github.c

 mOTUs does not provide the ability to construct custom databases. Therefore we recommend to use the the prebuilt database of marker genes provided by the developers.

+> ⚠️ **Do not change the directory name of the resulting database if moving to a central location** The database name of `db_mOTU/` is hardcoded in the mOTUs tool
+
 To do this you need to have `mOTUs` installed on your machine.

 ```bash
--- a/nextflow_schema.json
+++ b/nextflow_schema.json
@ -467,15 +467,18 @@
                },
                "motus_use_relative_abundance": {
                    "type": "boolean",
-                    "description": "Turn on printing relative abundance instead of counts."
+                    "description": "Turn on printing relative abundance instead of counts.",
+                    "fa_icon": "fas fa-percent"
                },
                "motus_save_mgc_read_counts": {
                    "type": "boolean",
-                    "description": "Turn on saving the mgc reads count."
+                    "description": "Turn on saving the mgc reads count.",
+                    "fa_icon": "fas fa-save"
                },
                "motus_remove_ncbi_ids": {
                    "type": "boolean",
-                    "description": "Turn on removing NCBI taxonomic IDs."
+                    "description": "Turn on removing NCBI taxonomic IDs.",
+                    "fa_icon": "fas fa-address-card"
                }
            },
            "fa_icon": "fas fa-align-center"
@ -490,7 +493,7 @@
                    "type": "boolean",
                    "fa_icon": "fas fa-toggle-on",
                    "description": "Turn on standardisation of taxon tables across profilers",
-                    "help_text": "Turns on standardisation of output OTU tables across all tools; each into a TSV format following the following scheme:\n\n|TAXON   | SAMPLE_A | SAMPLE_B |\n|-------------|----------------|-----------------|\n| taxon_a | 32               | 123             |\n| taxon_b | 1                 | 5                 |\n\nThis currently only is generated for mOTUs."
+                    "help_text": "Turns on standardisation of output OTU tables across all tools.\n\nThis happens in two forms, firstly - if available - by a given classifiers/profilers 'native' profile merger and standardisation (for Bracken, Kaiju, Kraken, Centrifuge, MetaPhlAn3, mOTUs), and secondly for _all_ classifier/profilers in the pipeline using [`taxpasta`](https://taxpasta.readthedocs.io).\n\nIn the latter case, taxpasta generates a standardised output as follows:\n\n|TAXON   | SAMPLE_A | SAMPLE_B |\n|-------------|----------------|-----------------|\n| taxon_a | 32               | 123             |\n| taxon_b | 1                 | 5                 |\n\nwhereas all other 'native' tools have varying format outputs. See pipeline [output](https://nf-co.re/taxprofiler) documentation for more information."
                },
                "standardisation_motus_generatebiom": {
                    "type": "boolean",