mirror of
https://github.com/MillironX/taxprofiler.git
synced 2024-11-22 13:19:54 +00:00
Apply review suggestions
This commit is contained in:
parent
98f3ab9fd1
commit
896a39e68d
1 changed files with 46 additions and 12 deletions
|
@ -162,6 +162,7 @@ Expected (uncompressed) database files for each tool are as follows:
|
|||
with same release version of the mOTUs tools. The database for same version tools
|
||||
can be thus reused for multiple runs. Users can download the database once using the script above and
|
||||
specify the path the database to the TSV table provided to `--databases`.
|
||||
- **KrakenUniq** WIP
|
||||
|
||||
## Running the pipeline
|
||||
|
||||
|
@ -184,7 +185,7 @@ work # Directory containing the nextflow working files
|
|||
|
||||
### Sequencing quality control
|
||||
|
||||
nf-core taxprofiler offers [`falco`](https://github.com/smithlabcode/falco) as an alternative option to [`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
|
||||
[`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. nf-core taxprofiler offers [`falco`](https://github.com/smithlabcode/falco) as an alternative option.
|
||||
|
||||
### Preprocessing Steps
|
||||
|
||||
|
@ -226,7 +227,7 @@ The tools offer different algorithms and parameters for removing low complexity
|
|||
|
||||
You can optionally save the FASTQ output of the run merging with the `--save_complexityfiltered_reads`. If running with `fastp`, complexity filtering happens inclusively within the earlier shortread preprocessing step. Therefore there will not be an independent pipeline step for complexity filtering, and no independent FASTQ file (i.e. `--save_complexityfiltered_reads` will be ignored) - your complexity filtered reads will also be in the `fastp/` folder in the same file(s) as the preprocessed read.
|
||||
|
||||
**We do not any read preprocessing or complexity filtering if you are using ONTs Guppy toolkit for basecalling and post-processing.**
|
||||
**We do not recommend performing any read preprocessing or complexity filtering if you are using ONTs Guppy toolkit for basecalling and post-processing.**
|
||||
|
||||
#### Host Removal
|
||||
|
||||
|
@ -484,19 +485,56 @@ NXF_OPTS='-Xms1g -Xmx4g'
|
|||
### Tutorial - How to create your custom database
|
||||
|
||||
#### Kraken2
|
||||
Kraken2 allows the user to build custom databases.
|
||||
|
||||
To install a taxonomy:
|
||||
|
||||
```bash
|
||||
kraken2-build --download-taxonomy --db $DBNAME
|
||||
```
|
||||
|
||||
To install one or more reference libraries:
|
||||
|
||||
```bash
|
||||
--download-library bacteria --db $DBNAME
|
||||
--download-library viral --db $DBNAME
|
||||
--download-library archaea --db $DBNAME
|
||||
```
|
||||
|
||||
To add more genomes:
|
||||
|
||||
```bash
|
||||
kraken2-build --add-to-library genome.fa --db $DBNAME
|
||||
```
|
||||
|
||||
You can follow Kraken2 [tutorial](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#custom-databases) for a more detailed description.
|
||||
|
||||
Kraken2 allows the user to build custom databases. You can follow Kraken2 [tutorial](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#custom-databases).
|
||||
|
||||
#### Centrifuge
|
||||
Centrifuge allows the user to [build custom databases](https://ccb.jhu.edu/software/centrifuge/manual.shtml#custom-database). The user should download taxonomy files, make custom `seqid2taxid.map` and combine the fasta files together.
|
||||
|
||||
Centrifuge allows the user to [build custom databases](https://ccb.jhu.edu/software/centrifuge/manual.shtml#custom-database).
|
||||
```bash
|
||||
centrifuge-download -o taxonomy taxonomy
|
||||
|
||||
## custom seqid2taxid.map
|
||||
NC_001133.9 4392
|
||||
NC_012920.1 9606
|
||||
NC_001134.8 4392
|
||||
NC_001135.5 4392
|
||||
|
||||
cat *.{fa,fna} > input-sequences.fna
|
||||
centrifuge-build -p 4 --conversion-table seqid2taxid.map --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp input-sequences.fna taxprofiler_cf
|
||||
```
|
||||
|
||||
#### Kaiju
|
||||
|
||||
It is possible to [create custom databases](https://github.com/bioinformatics-centre/kaiju#custom-database) with Kaiju.
|
||||
|
||||
#### MALT
|
||||
```bash
|
||||
kaiju-mkbwt -n 5 -a ACDEFGHIKLMNPQRSTVWY -o proteins proteins.faa
|
||||
kaiju-mkfmi proteins
|
||||
```
|
||||
|
||||
#### MALT
|
||||
To create a custom database for MALT, the user should download and unzip the following database which lists all NCBI records. The input files are specified using -i and the index is specified using -d. A detailed description for each argument can be found [here](https://software-ab.informatik.uni-tuebingen.de/download/malt/manual.pdf)
|
||||
|
||||
```bash
|
||||
|
@ -506,16 +544,13 @@ malt-build -i path/to/fasta/files/*.{fna,fa} -s DNA -d index -t 8 -st 4 -a2t meg
|
|||
```
|
||||
|
||||
#### Bracken
|
||||
|
||||
You can follow Bracken [tutorial](https://ccb.jhu.edu/software/bracken/index.shtml?t=manual) to build a custom database. Alternatively, you can use one of the indexes that can be found [here](https://benlangmead.github.io/aws-indexes/k2).
|
||||
|
||||
#### KrakenUniq
|
||||
|
||||
For KrakenUniq, we recommend using one of the available databases [here](https://benlangmead.github.io/aws-indexes/k2)
|
||||
For KrakenUniq, we recommend using one of the available databases [here](https://benlangmead.github.io/aws-indexes/k2). But if you wish to build your own, please see the [documentation](https://github.com/fbreitwieser/krakenuniq/blob/master/README.md#custom-databases-with-ncbi-taxonomy).
|
||||
|
||||
#### DIAMOND
|
||||
|
||||
To create a custom database for DIAMOND, the user should download and unzip the NCBI's taxonomy files. The `makedb` needs to be executed afterwards. A detailed description can be found [here](https://gensoft.pasteur.fr/docs/diamond/0.8.22/diamond_manual.pdf)
|
||||
To create a custom database for DIAMOND, the user should download and unzip the NCBI's taxonomy files. The `makedb` needs to be executed afterwards. A detailed description can be found [here](https://github.com/bbuchfink/diamond/wiki/1.-Tutorial)
|
||||
|
||||
```bash
|
||||
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdmp.zip
|
||||
|
@ -531,7 +566,6 @@ rm *dmp *txt *gz *prt *zip
|
|||
```
|
||||
|
||||
#### mOTUs
|
||||
|
||||
A detailed description on how to download mOTUs database can be found [here](https://github.com/motu-tool/mOTUs)
|
||||
|
||||
## Troubleshooting and FAQs
|
||||
|
|
Loading…
Reference in a new issue