New module: LIMA (#719)

* 📦 NEW: Add module lima * 👌 IMPROVE: Move .pbi output to reports channel * 🐛 FIX: Fix report channel definition * 👌IMPROVE; Remove options from command line update test script with removed options * 👌 IMPROVE: Add some pacbio test files * 🐛 FIX: Add Pacbio index to test_data.config * 👌 IMPROVE: Re add 10000 data test * 🐛 FIX: Add pbi input * 👌 IMPROVE: Add parallelization to lima * 👌 IMPROVE: Add some pbindex * 🐛 FIX: Add pbi extension to files * 👌 IMPROVE: The accept one channel (primers move into the first channel) * 👌 IMPROVE: Assign a value channel for pimers Improve code workflow readability * 👌 IMPROVE: Update .gitignore * 👌 IMPROVE: Update module to last template version * 🐛 FIX: Correct Singularity and Docker URL * 👌 IMPROVE: Update to the last version of modules template * 👌 IMPROVE: Update test_data.config * 👌 IMPROVE: Remove pbi from input files * 👌 IMPROVE: Final version of test datasets config * 👌 IMPROVE: Remove useless index + Fix Typos * 🐛 FIX: Fill contains args * 📦 NEW: Add module lima * 👌 IMPROVE: Move .pbi output to reports channel * 🐛 FIX: Fix report channel definition * 👌IMPROVE; Remove options from command line update test script with removed options * 🐛 FIX: Add pbi input * 👌 IMPROVE: Add parallelization to lima * 👌 IMPROVE: Add some pacbio test files * 🐛 FIX: Add Pacbio index to test_data.config * 👌 IMPROVE: Re add 10000 data test * 👌 IMPROVE: Add some pbindex * 🐛 FIX: Add pbi extension to files * 👌 IMPROVE: The accept one channel (primers move into the first channel) * 👌 IMPROVE: Assign a value channel for pimers Improve code workflow readability * 👌 IMPROVE: Update .gitignore * 👌 IMPROVE: Update module to last template version * 🐛 FIX: Correct Singularity and Docker URL * 👌 IMPROVE: Update to the last version of modules template * 👌 IMPROVE: Update test_data.config * 👌 IMPROVE: Remove pbi from input files * 👌 IMPROVE: Final version of test datasets config * 👌 IMPROVE: Remove useless index + Fix Typos * 🐛 FIX: Fill contains args * 👌 IMPROVE: Add channel for each output * 👌 IMPROVE: Remove comments * 📦 NEW: Add module lima * 👌 IMPROVE: Move .pbi output to reports channel * 🐛 FIX: Fix report channel definition * 👌IMPROVE; Remove options from command line update test script with removed options * 🐛 FIX: Add pbi input * 👌 IMPROVE: Add parallelization to lima * 👌 IMPROVE: Add some pacbio test files * 🐛 FIX: Add Pacbio index to test_data.config * 👌 IMPROVE: Re add 10000 data test * 👌 IMPROVE: Add some pbindex * 🐛 FIX: Add pbi extension to files * 👌 IMPROVE: The accept one channel (primers move into the first channel) * 👌 IMPROVE: Assign a value channel for pimers Improve code workflow readability * 👌 IMPROVE: Update module to last template version * 🐛 FIX: Correct Singularity and Docker URL * 👌 IMPROVE: Update to the last version of modules template * 👌 IMPROVE: Update test_data.config * 👌 IMPROVE: Remove pbi from input files * 🐛 FIX: Fill contains args * 📦 NEW: Add module lima * 👌 IMPROVE: Move .pbi output to reports channel * 🐛 FIX: Fix report channel definition * 👌IMPROVE; Remove options from command line update test script with removed options * 🐛 FIX: Add pbi input * 👌 IMPROVE: Add parallelization to lima * 👌 IMPROVE: Add some pacbio test files * 🐛 FIX: Add Pacbio index to test_data.config * 👌 IMPROVE: Re add 10000 data test * 👌 IMPROVE: Add some pbindex * 🐛 FIX: Add pbi extension to files * 👌 IMPROVE: The accept one channel (primers move into the first channel) * 👌 IMPROVE: Assign a value channel for pimers Improve code workflow readability * 👌 IMPROVE: Update module to last template version * 🐛 FIX: Correct Singularity and Docker URL * 👌 IMPROVE: Update to the last version of modules template * 👌 IMPROVE: Update test_data.config * 👌 IMPROVE: Remove pbi from input files * 👌 IMPROVE: Final version of test datasets config * 👌 IMPROVE: Remove useless index + Fix Typos * 🐛 FIX: Fill contains args * 👌 IMPROVE: Add channel for each output * 👌 IMPROVE: Remove comments * 🐛 FIX: Clean test_data.config * Update modules/lima/main.nf Add meta to each output Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> * Update modules/lima/main.nf Remove useless parenthesis Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> * 🐛 FIX: Keep version number only * 🐛 FIX: Reintegrate prefix variable and use it to define output file name * 👌 IMPROVE: add suffix arg to check output files names * 👌 IMPROVE: Use prefix for output filename * 🐛 FIX: Set optional output Allow usage of different input formats * 👌 IMPROVE: Update meta file * 👌 IMPROVE: Update test One test for each input file type * 👌 IMPROVE: add fasta, fastq.gz, fastq, fastq.gz test files * 👌 IMPROVE: Update with last templates / Follow new version.yaml rule * 🐛 FIX: Fix typos and include getProcessName function * 👌 IMPROVE: Update .gitignore * 👌 IMPROVE: Using suffix to manage output was not a my best idea Add a bash code to detect extension and update output file name * 👌 IMPROVE: clean code Co-authored-by: Harshil Patel <drpatelh@users.noreply.github.com> Co-authored-by: Gregor Sturm <mail@gregor-sturm.de> Co-authored-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se>
2024-11-10 20:23:10 +00:00 · 2021-09-27 16:14:35 +01:00 · 2021-09-27 16:14:35 +01:00 · 4ec8b025bd
commit 4ec8b025bd
parent 906577873b
8 changed files with 390 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -11,3 +11,5 @@ __pycache__
 *.pyo
 *.pyc
 tests/data/
+modules/modtest/
+tests/modules/modtest/
--- a/modules/lima/functions.nf
+++ b/modules/lima/functions.nf
@ -0,0 +1,78 @@
+//
+//  Utility functions used in nf-core DSL2 module files
+//
+
+//
+// Extract name of software tool from process name using $task.process
+//
+def getSoftwareName(task_process) {
+    return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()
+}
+
+//
+// Extract name of module from process name using $task.process
+//
+def getProcessName(task_process) {
+    return task_process.tokenize(':')[-1]
+}
+
+//
+// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules
+//
+def initOptions(Map args) {
+    def Map options = [:]
+    options.args            = args.args ?: ''
+    options.args2           = args.args2 ?: ''
+    options.args3           = args.args3 ?: ''
+    options.publish_by_meta = args.publish_by_meta ?: []
+    options.publish_dir     = args.publish_dir ?: ''
+    options.publish_files   = args.publish_files
+    options.suffix          = args.suffix ?: ''
+    return options
+}
+
+//
+// Tidy up and join elements of a list to return a path string
+//
+def getPathFromList(path_list) {
+    def paths = path_list.findAll { item -> !item?.trim().isEmpty() }      // Remove empty entries
+    paths     = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes
+    return paths.join('/')
+}
+
+//
+// Function to save/publish module results
+//
+def saveFiles(Map args) {
+    def ioptions  = initOptions(args.options)
+    def path_list = [ ioptions.publish_dir ?: args.publish_dir ]
+
+    // Do not publish versions.yml unless running from pytest workflow
+    if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) {
+        return null
+    }
+    if (ioptions.publish_by_meta) {
+        def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta
+        for (key in key_list) {
+            if (args.meta && key instanceof String) {
+                def path = key
+                if (args.meta.containsKey(key)) {
+                    path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key]
+                }
+                path = path instanceof String ? path : ''
+                path_list.add(path)
+            }
+        }
+    }
+    if (ioptions.publish_files instanceof Map) {
+        for (ext in ioptions.publish_files) {
+            if (args.filename.endsWith(ext.key)) {
+                def ext_list = path_list.collect()
+                ext_list.add(ext.value)
+                return "${getPathFromList(ext_list)}/$args.filename"
+            }
+        }
+    } else if (ioptions.publish_files == null) {
+        return "${getPathFromList(path_list)}/$args.filename"
+    }
+}
--- a/modules/lima/main.nf
+++ b/modules/lima/main.nf
@ -0,0 +1,73 @@
+// Import generic module functions
+include { initOptions; saveFiles; getSoftwareName; getProcessName } from './functions'
+
+params.options = [:]
+options        = initOptions(params.options)
+
+process LIMA {
+    tag "$meta.id"
+    label 'process_medium'
+    publishDir "${params.outdir}",
+        mode: params.publish_dir_mode,
+        saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) }
+
+    conda (params.enable_conda ? "bioconda::lima=2.2.0" : null)
+    if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
+        container "https://depot.galaxyproject.org/singularity/lima:2.2.0--h9ee0642_0"
+    } else {
+        container "quay.io/biocontainers/lima:2.2.0--h9ee0642_0"
+    }
+
+    input:
+    tuple val(meta), path(ccs)
+    path primers
+
+    output:
+    tuple val(meta), path("*.clips")  , emit: clips
+    tuple val(meta), path("*.counts") , emit: counts
+    tuple val(meta), path("*.guess")  , emit: guess
+    tuple val(meta), path("*.report") , emit: report
+    tuple val(meta), path("*.summary"), emit: summary
+    path "versions.yml"               , emit: version
+
+    tuple val(meta), path("*.bam")              , optional: true, emit: bam
+    tuple val(meta), path("*.bam.pbi")          , optional: true, emit: pbi
+    tuple val(meta), path("*.{fa, fasta}")      , optional: true, emit: fasta
+    tuple val(meta), path("*.{fa.gz, fasta.gz}"), optional: true, emit: fastagz
+    tuple val(meta), path("*.fastq")            , optional: true, emit: fastq
+    tuple val(meta), path("*.fastq.gz")         , optional: true, emit: fastqgz
+    tuple val(meta), path("*.xml")              , optional: true, emit: xml
+    tuple val(meta), path("*.json")             , optional: true, emit: json
+
+    script:
+    def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
+
+    """
+    OUT_EXT=""
+
+    if [[ $ccs =~ bam\$ ]]; then
+        OUT_EXT="bam"
+    elif [[ $ccs =~ fasta\$ ]]; then
+        OUT_EXT="fasta"
+    elif [[ $ccs =~ fasta.gz\$ ]]; then
+        OUT_EXT="fasta.gz"
+    elif [[ $ccs =~ fastq\$ ]]; then
+        OUT_EXT="fastq"
+    elif [[ $ccs =~ fastq.gz\$ ]]; then
+        OUT_EXT="fastq.gz"
+    fi
+
+    echo \$OUT_EXT
+    lima \\
+        $ccs \\
+        $primers \\
+        $prefix.\$OUT_EXT \\
+        -j $task.cpus \\
+        $options.args
+
+    cat <<-END_VERSIONS > versions.yml
+    ${getProcessName(task.process)}:
+        lima: \$( lima --version | sed 's/lima //g' | sed 's/ (.\\+//g' )
+    END_VERSIONS
+    """
+}
--- a/modules/lima/meta.yml
+++ b/modules/lima/meta.yml
@ -0,0 +1,77 @@
+name: lima
+description: lima - The PacBio Barcode Demultiplexer and Primer Remover
+keywords:
+  - sort
+tools:
+  - lima:
+      description: lima - The PacBio Barcode Demultiplexer and Primer Remover
+      homepage: https://github.com/PacificBiosciences/pbbioconda
+      documentation: https://lima.how/
+      tool_dev_url: https://github.com/pacificbiosciences/barcoding/
+      doi: ""
+      licence: ['BSD-3-clause-Clear']
+
+input:
+  - meta:
+      type: map
+      description: |
+        Groovy Map containing sample information
+        e.g. [ id:'test' ]
+  - ccs:
+      type: file
+      description: A BAM or fasta or fasta.gz or fastq or fastq.gz file of subreads or ccs
+      pattern: "*.{bam,fasta,fasta.gz,fastq,fastq.gz}"
+  - primers:
+      type: file
+      description: Fasta file, sequences of primers
+      pattern: "*.fasta"
+
+output:
+  - meta:
+      type: map
+      description: |
+        Groovy Map containing sample information
+        e.g. [ id:'test' ]
+  - bam:
+      type: file
+      description: A bam file of ccs purged of primers
+      pattern: "*.bam"
+  - pbi:
+      type: file
+      description: Pacbio index file of ccs purged of primers
+      pattern: "*.bam"
+  - xml:
+      type: file
+      description: An XML file representing a set of a particular sequence data type such as subreads, references or aligned subreads.
+      pattern: "*.xml"
+  - json:
+      type: file
+      description: A metadata json file
+      pattern: "*.json"
+  - clips:
+      type: file
+      description: A fasta file of clipped primers
+      pattern: "*.clips"
+  - counts:
+      type: file
+      description: A tabulated file of describing pairs of primers
+      pattern: "*.counts"
+  - guess:
+      type: file
+      description: A second tabulated file of describing pairs of primers (no doc available)
+      pattern: "*.guess"
+  - report:
+      type: file
+      description: A tab-separated file about each ZMW, unfiltered
+      pattern: "*.report"
+  - summary:
+      type: file
+      description: This file shows how many ZMWs have been filtered, how ZMWs many are same/different, and how many reads have been filtered.
+      pattern: "*.summary"
+  - version:
+      type: file
+      description: File containing software version
+      pattern: "*.{version.txt}"
+
+authors:
+  - "@sguizard"
--- a/tests/config/pytest_modules.yml
+++ b/tests/config/pytest_modules.yml
@ -562,6 +562,10 @@ last/train:
  - modules/last/train/**
  - tests/modules/last/train/**

+lima:
+  - modules/lima/**
+  - tests/modules/lima/**
+
 lofreq/call:
  - modules/lofreq/call/**
  - tests/modules/lofreq/call/**
--- a/tests/config/test_data.config
+++ b/tests/config/test_data.config
@ -175,6 +175,11 @@ params {
                alz                                           = "${test_data_dir}/genomics/homo_sapiens/pacbio/bam/alz.bam"
                alzpbi                                        = "${test_data_dir}/genomics/homo_sapiens/pacbio/bam/alz.bam.pbi"
                ccs                                           = "${test_data_dir}/genomics/homo_sapiens/pacbio/bam/alz.ccs.bam"
+                ccs_fa                                        = "${test_data_dir}/genomics/homo_sapiens/pacbio/fasta/alz.ccs.fasta"
+                ccs_fa_gz                                     = "${test_data_dir}/genomics/homo_sapiens/pacbio/fasta/alz.ccs.fasta.gz"
+                ccs_fq                                        = "${test_data_dir}/genomics/homo_sapiens/pacbio/fastq/alz.ccs.fastq"
+                ccs_fq_gz                                     = "${test_data_dir}/genomics/homo_sapiens/pacbio/fastq/alz.ccs.fastq.gz"
+                ccs_xml                                       = "${test_data_dir}/genomics/homo_sapiens/pacbio/xml/alz.ccs.consensusreadset.xml"
                lima                                          = "${test_data_dir}/genomics/homo_sapiens/pacbio/bam/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.bam"
                refine                                        = "${test_data_dir}/genomics/homo_sapiens/pacbio/bam/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.bam"
                cluster                                       = "${test_data_dir}/genomics/homo_sapiens/pacbio/bam/alz.ccs.fl.NEB_5p--NEB_Clontech_3p.flnc.clustered.bam"
--- a/tests/modules/lima/main.nf
+++ b/tests/modules/lima/main.nf
@ -0,0 +1,60 @@
+#!/usr/bin/env nextflow
+
+nextflow.enable.dsl = 2
+
+include { LIMA } from '../../../modules/lima/main.nf' addParams( options: [args: '--isoseq --peek-guess', suffix: ".fl"] )
+
+workflow test_lima_bam {
+
+    input = [
+                [ id:'test' ], // meta map
+                file(params.test_data['homo_sapiens']['pacbio']['ccs'],     checkIfExists: true),
+            ]
+    primers = [ file(params.test_data['homo_sapiens']['pacbio']['primers'], checkIfExists: true) ]
+
+    LIMA ( input, primers )
+}
+
+workflow test_lima_fa {
+
+    input = [
+                [ id:'test' ], // meta map
+                file(params.test_data['homo_sapiens']['pacbio']['ccs_fa'],  checkIfExists: true),
+            ]
+    primers = [ file(params.test_data['homo_sapiens']['pacbio']['primers'], checkIfExists: true) ]
+
+    LIMA ( input, primers )
+}
+
+workflow test_lima_fa_gz {
+
+    input = [
+                [ id:'test' ], // meta map
+                file(params.test_data['homo_sapiens']['pacbio']['ccs_fa_gz'], checkIfExists: true),
+            ]
+    primers = [ file(params.test_data['homo_sapiens']['pacbio']['primers'],   checkIfExists: true) ]
+
+    LIMA ( input, primers )
+}
+
+workflow test_lima_fq {
+
+    input = [
+                [ id:'test' ], // meta map
+                file(params.test_data['homo_sapiens']['pacbio']['ccs_fq'],  checkIfExists: true),
+            ]
+    primers = [ file(params.test_data['homo_sapiens']['pacbio']['primers'], checkIfExists: true) ]
+
+    LIMA ( input, primers )
+}
+
+workflow test_lima_fq_gz {
+
+    input = [
+                [ id:'test' ], // meta map
+                file(params.test_data['homo_sapiens']['pacbio']['ccs_fq_gz'], checkIfExists: true),
+            ]
+    primers = [ file(params.test_data['homo_sapiens']['pacbio']['primers'],   checkIfExists: true) ]
+
+    LIMA ( input, primers )
+}
--- a/tests/modules/lima/test.yml
+++ b/tests/modules/lima/test.yml
@ -0,0 +1,91 @@
+- name: lima test_lima_bam
+  command: nextflow run tests/modules/lima -entry test_lima_bam -c tests/config/nextflow.config
+  tags:
+    - lima
+  files:
+    - path: output/lima/test.fl.NEB_5p--NEB_Clontech_3p.bam
+      md5sum: 14b51d7f44e30c05a5b14e431a992097
+    - path: output/lima/test.fl.NEB_5p--NEB_Clontech_3p.bam.pbi
+      md5sum: 6ae7f057304ad17dd9d5f565d72d3f7b
+    - path: output/lima/test.fl.NEB_5p--NEB_Clontech_3p.consensusreadset.xml
+      contains: [ 'ConsensusReadSet' ]
+    - path: output/lima/test.fl.json
+      contains: [ 'ConsensusReadSet' ]
+    - path: output/lima/test.fl.lima.clips
+      md5sum: fa03bc75bd78b2648a139fd67c69208f
+    - path: output/lima/test.fl.lima.counts
+      md5sum: 842c6a23ca2de504ced4538ad5111da1
+    - path: output/lima/test.fl.lima.guess
+      md5sum: d3675af3ca8a908ee9e3c231668392d3
+    - path: output/lima/test.fl.lima.report
+      md5sum: dc073985322ae0a003ccc7e0fa4db5e6
+    - path: output/lima/test.fl.lima.summary
+      md5sum: bcbcaaaca418bdeb91141c81715ca420
+
+- name: lima test_lima_fa
+  command: nextflow run tests/modules/lima -entry test_lima_fa -c tests/config/nextflow.config
+  tags:
+    - lima
+  files:
+    - path: output/lima/test.fl.lima.clips
+      md5sum: 1012bc8874a14836f291bac48e8482a4
+    - path: output/lima/test.fl.lima.counts
+      md5sum: a4ceaa408be334eaa711577e95f8730e
+    - path: output/lima/test.fl.lima.guess
+      md5sum: 651e5f2b438b8ceadb3e06a2177e1818
+    - path: output/lima/test.fl.lima.report
+      md5sum: bd4a8bde17471563cf91aab4c787911d
+    - path: output/lima/test.fl.lima.summary
+      md5sum: 03be2311ba4afb878d8e547ab38c11eb
+
+- name: lima test_lima_fa_gz
+  command: nextflow run tests/modules/lima -entry test_lima_fa_gz -c tests/config/nextflow.config
+  tags:
+    - lima
+  files:
+    - path: output/lima/test.fl.lima.clips
+      md5sum: 1012bc8874a14836f291bac48e8482a4
+    - path: output/lima/test.fl.lima.counts
+      md5sum: a4ceaa408be334eaa711577e95f8730e
+    - path: output/lima/test.fl.lima.guess
+      md5sum: 651e5f2b438b8ceadb3e06a2177e1818
+    - path: output/lima/test.fl.lima.report
+      md5sum: bd4a8bde17471563cf91aab4c787911d
+    - path: output/lima/test.fl.lima.summary
+      md5sum: 03be2311ba4afb878d8e547ab38c11eb
+
+- name: lima test_lima_fq
+  command: nextflow run tests/modules/lima -entry test_lima_fq -c tests/config/nextflow.config
+  tags:
+    - lima
+  files:
+    - path: output/lima/test.fl.NEB_5p--NEB_Clontech_3p.fastq
+      md5sum: ef395f689c5566f501e300bb83d7a5f2
+    - path: output/lima/test.fl.lima.clips
+      md5sum: 5c16ef8122f6f1798acc30eb8a30828c
+    - path: output/lima/test.fl.lima.counts
+      md5sum: 767b687e6eda7b24cd0e577f527eb2f0
+    - path: output/lima/test.fl.lima.guess
+      md5sum: 31b988aab6bda84867e704b9edd8a763
+    - path: output/lima/test.fl.lima.report
+      md5sum: ad2a9b1eeb4cda4a1f69ef4b7520b5fd
+    - path: output/lima/test.fl.lima.summary
+      md5sum: e91d3c386aaf4effa63f33ee2eb7da2a
+
+- name: lima test_lima_fq_gz
+  command: nextflow run tests/modules/lima -entry test_lima_fq_gz -c tests/config/nextflow.config
+  tags:
+    - lima
+  files:
+    - path: output/lima/test.fl.NEB_5p--NEB_Clontech_3p.fastq.gz
+      md5sum: 32c11db85f69a1b4454b6bbd794b6df2
+    - path: output/lima/test.fl.lima.clips
+      md5sum: 5c16ef8122f6f1798acc30eb8a30828c
+    - path: output/lima/test.fl.lima.counts
+      md5sum: 767b687e6eda7b24cd0e577f527eb2f0
+    - path: output/lima/test.fl.lima.guess
+      md5sum: 31b988aab6bda84867e704b9edd8a763
+    - path: output/lima/test.fl.lima.report
+      md5sum: ad2a9b1eeb4cda4a1f69ef4b7520b5fd
+    - path: output/lima/test.fl.lima.summary
+      md5sum: e91d3c386aaf4effa63f33ee2eb7da2a