nf-core_modules/tests/modules/last/lastal/main.nf

29 lines
1.1 KiB
Text
Raw Normal View History

New last/lastal module to align query sequences on a target index (#510) * New last/lastal to align query sequences on a target index `lastal` is the main program of the [LAST](https://gitlab.com/mcfrith/last) suite. It align query DNA sequences in FASTA or FASTQ format to a target index of DNA or protein sequences. The index is produced by the `lastdb` program (module `last/lastdb`). The score matrix for evaluating the alignment can be chosen among preset ones or computed iteratively by the `last-train` program (module `last/train`). For this reason, the `last/lastal` module proposed here has one input channel containing an optional file, that has to be dummy when not used. The LAST aligner outputs MAF files that can be very large (up to hundreds of gigabytes), therefore this module unconditionally compresses its output with gzip. This new module is part of the work described in Issue #464. During this development, we fix the version of LAST to 1219 to ensure consistency (hence ignore lint's version warning). * Apply suggestions from code review Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> * Un-hardcode the path to the LAST index. Among multiple alternatives I have chosen the following command to detect the sample name of the index, because it fails in situations where there is no index files in the index folder, and in situations were there are two indexes files in the folder. Not failing would result in feeding garbage information in the INDEX_NAME variable. basename \$(ls $index/*.bck) .bck In case of missing file, a clear error message is given by `ls`. In case of more than one file, the error message of `basename` is more cryptic, unfortunately. (`basename: extra operand ‘.bck’`) Alternatives that do not fail if there is no .bck file: basename $index/*bck .bck find $index -name '*bck' | sed 's/.bck//' Alternatives that do not fail if there are more than one .bck file: basename -s .bck $index/*bck ls $index/*.bck | xargs basename -s .bck find $index -name '*bck' | sed 's/.bck//' Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-26 06:10:48 +09:00
#!/usr/bin/env nextflow
nextflow.enable.dsl = 2
Update all modules to new NF DSL2 syntax (#1099) * Add comment line for consistency * Remove all functions.nf * Remove include functions.nf and publishDir options * Replace options.args3 with task.ext.args3 - 3 modules * Replace options.args3 with task.ext.args3 - 17 modules * Replace {task.cpus} with task.cpus * Replace off on off off off off off off off on off on off on off off off on off off off on on off off off on on off off off off off off off on off off off off on off on on off off off on on on on off off off on off on on off on on off off on on on off on on off on off off off off on off off off on off off on off on off off off on on off on off on off off on off off off on off off off on off off off off on off off off on on on off on on off off on off on on on off on on off on on on off off off off off on on off off on off off off off off on off off on on off on on off on off off off on off off off off on on off on off off on off off on off on off off off off off off off off on on off on off off off.args with * Add def args = task.ext.args line to all modules in script section * Replace options.args with args and args_list * Initialise args2 and args3 properly * Replace container syntax * Revert container changes for cellranger/mkref * Replace getProcessName in all modules * Replace getSoftwareName in all modules * Unify modules using VERSION variable * Replae options.suffix with task.ext.suffix * Remove NF version restriction for CI * Bump NF version in README * Replace task.process.tokenize logic with task.process * Minor tweaks to unify syntax in tests main.nf * Add a separate nextflow.config for each module * Transfer remaining module options to nextflow.config * Remove addParams from tests main.nf * Remove TODO statements * Use -c to import module specific config * Bump NF version to 21.10.3 * Fix tests for artic/minion * Fix broken publishDir syntax * Standardise and fix obvious failing module tests * Remove kronatools to krona * Comment out tags in subworkflow test.yml * Fix failing module tests * Add consistent indentation to nextflow.config * Comment out subworklow definitions * Fix kallistobustools/ref * Fix rmarkdownnotebook * Fix jupyternotebook * Quote task.process * Add plink2/vcf to pytest_modules.yml * Remove NF_CORE_MODULES_TEST from pytest CI * Fix more tests * Move bacteroides_fragilis to prokaryotes folder * Fix cooler merge tests * Fix kallistobustools/count tests * Fix kallistobustools/ref tests * Update test_10x_1_fastq_gz file for kallistobustools/count tests * Fix bcftools/query tests * Fix delly/call tests * Fix cooler/zoomify tests * Fix csvtk/split tests * Fix gatk4/intervallisttools tests * Fix gatk4/variantfiltration * Fix pydamage/filter tests * Fix test data for unicycler * Fix gstama/collapse module * Fix leehom tests * Fix metaphlan3 tests * Fix pairtools/select tests * Update nextflow.config * Update nextflow.config * feat: update syntax * Fix arriba tests * Fix more failing tests * Update test syntax * Remove comments from tests nextflow.config * Apply suggestions from code review * Fix kallistobustools/count module * Update dumpsoftwareversions module * Update custom/dumpsoftwareversions * Add args2 to untar module * Update leftover modules * Remove last remaining addParams Co-authored-by: JoseEspinosa <kadomu@gmail.com> Co-authored-by: Gregor Sturm <mail@gregor-sturm.de> Co-authored-by: MaxUlysse <max.u.garcia@gmail.com>
2021-11-26 07:58:40 +00:00
include { UNTAR } from '../../../../modules/untar/main.nf'
include { LAST_LASTAL } from '../../../../modules/last/lastal/main.nf'
New last/lastal module to align query sequences on a target index (#510) * New last/lastal to align query sequences on a target index `lastal` is the main program of the [LAST](https://gitlab.com/mcfrith/last) suite. It align query DNA sequences in FASTA or FASTQ format to a target index of DNA or protein sequences. The index is produced by the `lastdb` program (module `last/lastdb`). The score matrix for evaluating the alignment can be chosen among preset ones or computed iteratively by the `last-train` program (module `last/train`). For this reason, the `last/lastal` module proposed here has one input channel containing an optional file, that has to be dummy when not used. The LAST aligner outputs MAF files that can be very large (up to hundreds of gigabytes), therefore this module unconditionally compresses its output with gzip. This new module is part of the work described in Issue #464. During this development, we fix the version of LAST to 1219 to ensure consistency (hence ignore lint's version warning). * Apply suggestions from code review Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> * Un-hardcode the path to the LAST index. Among multiple alternatives I have chosen the following command to detect the sample name of the index, because it fails in situations where there is no index files in the index folder, and in situations were there are two indexes files in the folder. Not failing would result in feeding garbage information in the INDEX_NAME variable. basename \$(ls $index/*.bck) .bck In case of missing file, a clear error message is given by `ls`. In case of more than one file, the error message of `basename` is more cryptic, unfortunately. (`basename: extra operand ‘.bck’`) Alternatives that do not fail if there is no .bck file: basename $index/*bck .bck find $index -name '*bck' | sed 's/.bck//' Alternatives that do not fail if there are more than one .bck file: basename -s .bck $index/*bck ls $index/*.bck | xargs basename -s .bck find $index -name '*bck' | sed 's/.bck//' Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-26 06:10:48 +09:00
workflow test_last_lastal_with_dummy_param_file {
input = [ [ id:'contigs', single_end:false ], // meta map
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true),
[] ]
db = [ [], file(params.test_data['sarscov2']['genome']['lastdb_tar_gz'], checkIfExists: true) ]
New last/lastal module to align query sequences on a target index (#510) * New last/lastal to align query sequences on a target index `lastal` is the main program of the [LAST](https://gitlab.com/mcfrith/last) suite. It align query DNA sequences in FASTA or FASTQ format to a target index of DNA or protein sequences. The index is produced by the `lastdb` program (module `last/lastdb`). The score matrix for evaluating the alignment can be chosen among preset ones or computed iteratively by the `last-train` program (module `last/train`). For this reason, the `last/lastal` module proposed here has one input channel containing an optional file, that has to be dummy when not used. The LAST aligner outputs MAF files that can be very large (up to hundreds of gigabytes), therefore this module unconditionally compresses its output with gzip. This new module is part of the work described in Issue #464. During this development, we fix the version of LAST to 1219 to ensure consistency (hence ignore lint's version warning). * Apply suggestions from code review Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> * Un-hardcode the path to the LAST index. Among multiple alternatives I have chosen the following command to detect the sample name of the index, because it fails in situations where there is no index files in the index folder, and in situations were there are two indexes files in the folder. Not failing would result in feeding garbage information in the INDEX_NAME variable. basename \$(ls $index/*.bck) .bck In case of missing file, a clear error message is given by `ls`. In case of more than one file, the error message of `basename` is more cryptic, unfortunately. (`basename: extra operand ‘.bck’`) Alternatives that do not fail if there is no .bck file: basename $index/*bck .bck find $index -name '*bck' | sed 's/.bck//' Alternatives that do not fail if there are more than one .bck file: basename -s .bck $index/*bck ls $index/*.bck | xargs basename -s .bck find $index -name '*bck' | sed 's/.bck//' Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-26 06:10:48 +09:00
UNTAR ( db )
LAST_LASTAL ( input, UNTAR.out.untar.map{ it[1] })
New last/lastal module to align query sequences on a target index (#510) * New last/lastal to align query sequences on a target index `lastal` is the main program of the [LAST](https://gitlab.com/mcfrith/last) suite. It align query DNA sequences in FASTA or FASTQ format to a target index of DNA or protein sequences. The index is produced by the `lastdb` program (module `last/lastdb`). The score matrix for evaluating the alignment can be chosen among preset ones or computed iteratively by the `last-train` program (module `last/train`). For this reason, the `last/lastal` module proposed here has one input channel containing an optional file, that has to be dummy when not used. The LAST aligner outputs MAF files that can be very large (up to hundreds of gigabytes), therefore this module unconditionally compresses its output with gzip. This new module is part of the work described in Issue #464. During this development, we fix the version of LAST to 1219 to ensure consistency (hence ignore lint's version warning). * Apply suggestions from code review Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> * Un-hardcode the path to the LAST index. Among multiple alternatives I have chosen the following command to detect the sample name of the index, because it fails in situations where there is no index files in the index folder, and in situations were there are two indexes files in the folder. Not failing would result in feeding garbage information in the INDEX_NAME variable. basename \$(ls $index/*.bck) .bck In case of missing file, a clear error message is given by `ls`. In case of more than one file, the error message of `basename` is more cryptic, unfortunately. (`basename: extra operand ‘.bck’`) Alternatives that do not fail if there is no .bck file: basename $index/*bck .bck find $index -name '*bck' | sed 's/.bck//' Alternatives that do not fail if there are more than one .bck file: basename -s .bck $index/*bck ls $index/*.bck | xargs basename -s .bck find $index -name '*bck' | sed 's/.bck//' Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-26 06:10:48 +09:00
}
workflow test_last_lastal_with_real_param_file {
input = [ [ id:'contigs', single_end:false ], // meta map
file(params.test_data['sarscov2']['illumina']['contigs_fasta'], checkIfExists: true),
file(params.test_data['sarscov2']['genome']['contigs_genome_par'], checkIfExists: true) ]
db = [ [], file(params.test_data['sarscov2']['genome']['lastdb_tar_gz'], checkIfExists: true) ]
New last/lastal module to align query sequences on a target index (#510) * New last/lastal to align query sequences on a target index `lastal` is the main program of the [LAST](https://gitlab.com/mcfrith/last) suite. It align query DNA sequences in FASTA or FASTQ format to a target index of DNA or protein sequences. The index is produced by the `lastdb` program (module `last/lastdb`). The score matrix for evaluating the alignment can be chosen among preset ones or computed iteratively by the `last-train` program (module `last/train`). For this reason, the `last/lastal` module proposed here has one input channel containing an optional file, that has to be dummy when not used. The LAST aligner outputs MAF files that can be very large (up to hundreds of gigabytes), therefore this module unconditionally compresses its output with gzip. This new module is part of the work described in Issue #464. During this development, we fix the version of LAST to 1219 to ensure consistency (hence ignore lint's version warning). * Apply suggestions from code review Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> * Un-hardcode the path to the LAST index. Among multiple alternatives I have chosen the following command to detect the sample name of the index, because it fails in situations where there is no index files in the index folder, and in situations were there are two indexes files in the folder. Not failing would result in feeding garbage information in the INDEX_NAME variable. basename \$(ls $index/*.bck) .bck In case of missing file, a clear error message is given by `ls`. In case of more than one file, the error message of `basename` is more cryptic, unfortunately. (`basename: extra operand ‘.bck’`) Alternatives that do not fail if there is no .bck file: basename $index/*bck .bck find $index -name '*bck' | sed 's/.bck//' Alternatives that do not fail if there are more than one .bck file: basename -s .bck $index/*bck ls $index/*.bck | xargs basename -s .bck find $index -name '*bck' | sed 's/.bck//' Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-26 06:10:48 +09:00
UNTAR ( db )
LAST_LASTAL ( input, UNTAR.out.untar.map{ it[1] })
New last/lastal module to align query sequences on a target index (#510) * New last/lastal to align query sequences on a target index `lastal` is the main program of the [LAST](https://gitlab.com/mcfrith/last) suite. It align query DNA sequences in FASTA or FASTQ format to a target index of DNA or protein sequences. The index is produced by the `lastdb` program (module `last/lastdb`). The score matrix for evaluating the alignment can be chosen among preset ones or computed iteratively by the `last-train` program (module `last/train`). For this reason, the `last/lastal` module proposed here has one input channel containing an optional file, that has to be dummy when not used. The LAST aligner outputs MAF files that can be very large (up to hundreds of gigabytes), therefore this module unconditionally compresses its output with gzip. This new module is part of the work described in Issue #464. During this development, we fix the version of LAST to 1219 to ensure consistency (hence ignore lint's version warning). * Apply suggestions from code review Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> * Un-hardcode the path to the LAST index. Among multiple alternatives I have chosen the following command to detect the sample name of the index, because it fails in situations where there is no index files in the index folder, and in situations were there are two indexes files in the folder. Not failing would result in feeding garbage information in the INDEX_NAME variable. basename \$(ls $index/*.bck) .bck In case of missing file, a clear error message is given by `ls`. In case of more than one file, the error message of `basename` is more cryptic, unfortunately. (`basename: extra operand ‘.bck’`) Alternatives that do not fail if there is no .bck file: basename $index/*bck .bck find $index -name '*bck' | sed 's/.bck//' Alternatives that do not fail if there are more than one .bck file: basename -s .bck $index/*bck ls $index/*.bck | xargs basename -s .bck find $index -name '*bck' | sed 's/.bck//' Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-26 06:10:48 +09:00
}