Commit graph

4 commits

Author SHA1 Message Date
Charles Plessy
e75f88c68a
New module last/mafswap to reorder sequences in alignments (#500)
* New module last/mafswap to reorder sequences in alignments

The `maf-swap` tool distributed with [LAST](https://gitlab.com/mcfrith/last)
reorders sequences in alignment files in Multiple Alignment Format.
When run without command-line arguments, it will swap the target and the
query sequences.  This is useful when turning a many-to-many alignment
into a many-to-one and then a one-to-one alignment in conjunction with
the `last-split` command (split, swap, split and swap again).

The LAST aligner outputs MAF files, but other tools also use this
format.  As MAF files can be very large (up to hundreds of gigabytes),
the module expects its input to be compressed with gzip and will
compress its output.

This new module is part of the work described in Issue #464.  During
this development, we fix the version of LAST to 1219 to ensure
consistency (hence ignore lint's version warning).

* Update MD5 sum.

Actually, 7029066c27ac6f5ef18d660d5741979a is the MD5 sum of
an empty file compressed with `gzip --no-name`…  This happened
because I forgot to update the config file after correcting the
module… sorry !

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Change name as suggested in pull request.

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-19 08:59:23 +01:00
aleksandrabliznina
b592cea30b
New last/train module to train alignment parameters. (#492)
* New last/train module to train alignment parameters.

The last-train command creates a parameter file that
will be used by last/lastal module for sequence alignment.
It takes indexed sequences and query sequences as input
and we use the metadata of both to create an id of the
parameter output file.

Submission of the LAST modules is discussed in more
details in the issue #464. For consistancy, we use LAST
version 1219 for this whole development and will upgrade later.

* Corrected files according to the nf-core v1.14 standards.

* Fixed function.nf file for the last-train module.

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Find index name.

* Correct after the input channels were changed.

* Use double underscore as a name separator.

Single underscores can happen in ids, therefore, we would like to keep two underscores.

* Remove extra spaces.

* Fixed the passing of the "score matrix" line.

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Update software/last/train/main.nf

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-19 08:37:08 +01:00
Jose Espinosa-Carrasco
95e02f913f
Update comments with new style (#497)
* Update comment style on functions.nf files

* Update test main.nf comments

* Add meta for ggread
2021-05-12 14:56:46 +01:00
Charles Plessy
16d20a7cc4
New last/lastdb module to index sequences before alignment. (#476)
* New last/lastdb module to index sequences before alignment.

The `lastdb` command creates a sequence index for the LAST aligner
(https://gitlab.com/mcfrith/last). Input can be in FASTA or FASTQ
format, and compression is handled automagically.  DNA or protein
sequences can be indexed.

The sequence index is a collection of files sharing the same basename.
This module sets the basename to the sample identifier (`$meta.id`) and
creates the index in a directory always called `lastdb`.  The module's
output channel then conveys a copy of the metadata and the path to the
`lastdb` directory.

Other modules will follow (see Issue #464).  The LAST aligner can align
proteins to proteins, DNA to DNA and can translate DNA align to
proteins.

* Remove trailing whitespace.

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Update as suggested in PR.

* Attempt to pass linting.

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-02 11:36:31 +01:00