1
0
Fork 0
mirror of https://github.com/MillironX/taxprofiler.git synced 2024-11-22 11:29:54 +00:00

Apply review suggestion

Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
This commit is contained in:
Sofia Stamouli 2022-12-20 17:32:13 +01:00 committed by GitHub
parent af8fd18d97
commit 3c33ba66ca
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -188,17 +188,25 @@ Note that the FASTQ file(s) may _not_ always be the 'final' reads that go into t
### Kraken2
[Kraken](https://ccb.jhu.edu/software/kraken2/) is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps -mers to the lowest common ancestor (LCA) of all genomes known to contain a given k-mer.
<details markdown="1">
<summary>Output files</summary>
- `kraken2`
- `<sample_id>.classified.fastq.gz`
- `<sample_id>.unclassified.fastq.gz`
- `<sample_id>.report.txt`
- `<sample_id>.classifiedreads.txt`
- `kraken2/`
- `<db_name>_combined_reports.txt`: A combined profile of all samples aligned to a given database (as generated by `krakentools`)
- <db_name>/
- `<sample_id>_<db_name>.classified.fastq.gz`: FASTQ file containing all reads that had a hit against a reference in the database for a given sample
- `<sample_id>_<db_name>.unclassified.fastq.gz`: FASTQ file containing all reads that did not have a hit in the database for a given sample
- `<sample_id>_<db_name>.report.txt`: A Kraken2 report that summarises the fraction abundance, taxonomic ID, number of Kmers, taxonomic path of all the hits in the Kraken2 run for a given sample
- `<sample_id>_<db_name>.classifiedreads.txt`: A list of read IDs and the hits each read had against each database for a given sample
</details>
The main taxonomic profiling file from Kraken2 is the `_combined_reports.txt` or `*report.txt` file. The former provides you the broadest over view of the taxonomic profiling results across all samples against a single databse, where you get two columns for each sample e.g. `2_all` and `2_lvl`, as well as a summarised column summing up across all samples `tot_all` and `tot_lvl`. The latter gives you the most information for a single sample. The report file is also used for the taxpasta step.
You will only recieve the FASTQs and `*classifiedreads.txt` file if you supply `--kraken2_save_reads` and/or `--kraken2_save_readclassification` parameters to the pipeline.
### KrakenUniq
<details markdown="1">