Commit graph

2 commits

Author SHA1 Message Date
Charles Plessy
c8168bc351
Update last module (#533)
* Update LAST to version 1238.

* Update functions.nf to the latest devel version.

* Update test MD5sums after updating software version.

* Make portable on MacOS

* Allow input alignments to be uncompressed.

While the strategy in this family of modules is to make all inputs and
outputs compressed, this change might be useful to some users.

As of LAST 2138, `last/split` does not allow its input to be compressed.

* Search for .des file, that is guaranteed to be unique.

Some LAST indexes have more than one .bck file and it makes the name
detection crash.

In this commit, I also standardise how the names are detected.

* Use value input channel and optional output channels to handle formats.

As discussed on Slack, it is preferred to use a value input channel
instead of sneaking options through `params.args2` or `params.format`
as we did.

Likewise, optional output channels with clearly labeled format are
preferred to 'catch-all' wildcards.
2021-06-14 12:27:27 +01:00
Charles Plessy
207930139a
New last/lastal module to align query sequences on a target index (#510)
* New last/lastal to align query sequences on a target index

`lastal` is the main program of the [LAST](https://gitlab.com/mcfrith/last)
suite.  It align query DNA sequences in FASTA or FASTQ format to a
target index of DNA or protein sequences.  The index is produced by
the `lastdb` program (module `last/lastdb`).  The score matrix for
evaluating the alignment can be chosen among preset ones or computed
iteratively by the `last-train` program (module `last/train`).  For
this reason, the `last/lastal` module proposed here has one input
channel containing an optional file, that has to be dummy when not used.

The LAST aligner outputs MAF files that can be very large (up to
hundreds of gigabytes), therefore this module unconditionally compresses
its output with gzip.

This new module is part of the work described in Issue #464.  During
this development, we fix the version of LAST to 1219 to ensure
consistency (hence ignore lint's version warning).

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Un-hardcode the path to the LAST index.

Among multiple alternatives I have chosen the following command to
detect the sample name of the index, because it fails in situations
where there is no index files in the index folder, and in situations
were there are two indexes files in the folder.  Not failing would
result in feeding garbage information in the INDEX_NAME variable.

    basename \$(ls $index/*.bck) .bck

In case of missing file, a clear error message is given by `ls`.  In
case of more than one file, the error message of `basename` is more
cryptic, unfortunately.  (`basename: extra operand ‘.bck’`)

Alternatives that do not fail if there is no .bck file:

    basename $index/*bck .bck
    find $index -name '*bck' | sed 's/.bck//'

Alternatives that do not fail if there are more than one .bck file:

    basename -s .bck $index/*bck
    ls $index/*.bck | xargs basename -s .bck
    find $index -name '*bck' | sed 's/.bck//'

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-25 22:10:48 +01:00