Commit graph

9 commits

Author SHA1 Message Date
aleksandrabliznina
0745213729
New last/mafconvert module to convert MAF alignments. (#527)
* New last/mafconvert module to convert MAF alignments.

The `maf-convert` tool distributed with [LAST](https://gitlab.com/mcfrith/last)
reads alignmnts in [MAF](https://genome-asia.ucsc.edu/FAQ/FAQformat.html#format5)
format and converts them in another format (axt, blast, blasttab, chain,
gff, html, psl, sam, tab).

This new module is part of the work described in Issue #464. During this
development, we fix the versiob of LAST to 1219 to ensure consistency.
We will upgrade it later.

* Delete white space.

* Update the function.nf file to the dev version.
2021-06-09 08:38:57 +02:00
Charles Plessy
ca321ce69d
New module last/postmask to filter alignment files (#526)
The `last-postmask` tool distributed with [LAST](https://gitlab.com/mcfrith/last)
filters alignments in a MAF file to remove those with too many masked
(lower-case) positions compared with their score.

As other filter modules like `last/split`, its output file risks to
overwrite its input file as their names are constructed from the sample
ID when multiple filters are chained in the pipeline.  I added a check
that gives a clearer error message in this case.  Please let me know
what you think about; I can add this test to the existing LAST modules
as well.

This new module is part of the work discribed in Issue #464. During this
development, we fix the version of LAST to 1219 to ensure consistency.
We will upgrade it later.
2021-06-08 11:14:08 +02:00
Charles Plessy
f7ebc2fc48
New last/dotplot module for pairwise similarity plots (#529)
* New last/dotplot module for pairwise similarity plots

The `last-dotplot` tool takes a pairwise alignment in
[MAF](http://genome.ucsc.edu/FAQ/FAQformat.html#format5) format,
possibly compressed with gzip, or in a tabular format produced by the
`maf-convert` tool, and produces a similarity dot-plot of the two
sequences in one of the graphical formats supported by the Python
Imaging Library.

A the tool guesses the output format by the file extension of the file,
which is constructed by the module at run time, I have used the `args2`
option to convey this information to the module.

This new module is part of the work described in Issue #464.  During
this development, we fix the version of LAST to 1219 to ensure
consistency (hence please ignore lint's version warning).

* Update the functions.nf file to the dev branch.

https://raw.githubusercontent.com/nf-core/tools/dev/nf_core/module-template/software/functions.nf
2021-06-08 11:13:51 +02:00
Charles Plessy
207930139a
New last/lastal module to align query sequences on a target index (#510)
* New last/lastal to align query sequences on a target index

`lastal` is the main program of the [LAST](https://gitlab.com/mcfrith/last)
suite.  It align query DNA sequences in FASTA or FASTQ format to a
target index of DNA or protein sequences.  The index is produced by
the `lastdb` program (module `last/lastdb`).  The score matrix for
evaluating the alignment can be chosen among preset ones or computed
iteratively by the `last-train` program (module `last/train`).  For
this reason, the `last/lastal` module proposed here has one input
channel containing an optional file, that has to be dummy when not used.

The LAST aligner outputs MAF files that can be very large (up to
hundreds of gigabytes), therefore this module unconditionally compresses
its output with gzip.

This new module is part of the work described in Issue #464.  During
this development, we fix the version of LAST to 1219 to ensure
consistency (hence ignore lint's version warning).

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Un-hardcode the path to the LAST index.

Among multiple alternatives I have chosen the following command to
detect the sample name of the index, because it fails in situations
where there is no index files in the index folder, and in situations
were there are two indexes files in the folder.  Not failing would
result in feeding garbage information in the INDEX_NAME variable.

    basename \$(ls $index/*.bck) .bck

In case of missing file, a clear error message is given by `ls`.  In
case of more than one file, the error message of `basename` is more
cryptic, unfortunately.  (`basename: extra operand ‘.bck’`)

Alternatives that do not fail if there is no .bck file:

    basename $index/*bck .bck
    find $index -name '*bck' | sed 's/.bck//'

Alternatives that do not fail if there are more than one .bck file:

    basename -s .bck $index/*bck
    ls $index/*.bck | xargs basename -s .bck
    find $index -name '*bck' | sed 's/.bck//'

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-25 22:10:48 +01:00
aleksandrabliznina
4575e5455c
New last/split module to find split alignments. (#511)
* New last/split module to find split alignments.

The `last-split` tool distributed with [LAST](https://gitlab.com/mcfrith/last)
finds split or spliced alignments in a MAF file that is produced with, for
example, LAST `lastal` command.

This new module is part of the work discribed in Issue #464. During this
development, we fix the versiob of LAST to 1219 to ensure consistency. We will
upgrade it later.

* Update software/last/split/main.nf

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-24 20:15:57 +01:00
Charles Plessy
e75f88c68a
New module last/mafswap to reorder sequences in alignments (#500)
* New module last/mafswap to reorder sequences in alignments

The `maf-swap` tool distributed with [LAST](https://gitlab.com/mcfrith/last)
reorders sequences in alignment files in Multiple Alignment Format.
When run without command-line arguments, it will swap the target and the
query sequences.  This is useful when turning a many-to-many alignment
into a many-to-one and then a one-to-one alignment in conjunction with
the `last-split` command (split, swap, split and swap again).

The LAST aligner outputs MAF files, but other tools also use this
format.  As MAF files can be very large (up to hundreds of gigabytes),
the module expects its input to be compressed with gzip and will
compress its output.

This new module is part of the work described in Issue #464.  During
this development, we fix the version of LAST to 1219 to ensure
consistency (hence ignore lint's version warning).

* Update MD5 sum.

Actually, 7029066c27ac6f5ef18d660d5741979a is the MD5 sum of
an empty file compressed with `gzip --no-name`…  This happened
because I forgot to update the config file after correcting the
module… sorry !

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Change name as suggested in pull request.

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-19 08:59:23 +01:00
aleksandrabliznina
b592cea30b
New last/train module to train alignment parameters. (#492)
* New last/train module to train alignment parameters.

The last-train command creates a parameter file that
will be used by last/lastal module for sequence alignment.
It takes indexed sequences and query sequences as input
and we use the metadata of both to create an id of the
parameter output file.

Submission of the LAST modules is discussed in more
details in the issue #464. For consistancy, we use LAST
version 1219 for this whole development and will upgrade later.

* Corrected files according to the nf-core v1.14 standards.

* Fixed function.nf file for the last-train module.

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Find index name.

* Correct after the input channels were changed.

* Use double underscore as a name separator.

Single underscores can happen in ids, therefore, we would like to keep two underscores.

* Remove extra spaces.

* Fixed the passing of the "score matrix" line.

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Update software/last/train/main.nf

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-19 08:37:08 +01:00
Jose Espinosa-Carrasco
95e02f913f
Update comments with new style (#497)
* Update comment style on functions.nf files

* Update test main.nf comments

* Add meta for ggread
2021-05-12 14:56:46 +01:00
Charles Plessy
16d20a7cc4
New last/lastdb module to index sequences before alignment. (#476)
* New last/lastdb module to index sequences before alignment.

The `lastdb` command creates a sequence index for the LAST aligner
(https://gitlab.com/mcfrith/last). Input can be in FASTA or FASTQ
format, and compression is handled automagically.  DNA or protein
sequences can be indexed.

The sequence index is a collection of files sharing the same basename.
This module sets the basename to the sample identifier (`$meta.id`) and
creates the index in a directory always called `lastdb`.  The module's
output channel then conveys a copy of the metadata and the path to the
`lastdb` directory.

Other modules will follow (see Issue #464).  The LAST aligner can align
proteins to proteins, DNA to DNA and can translate DNA align to
proteins.

* Remove trailing whitespace.

* Apply suggestions from code review

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

* Update as suggested in PR.

* Attempt to pass linting.

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>

Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com>
2021-05-02 11:36:31 +01:00