Homer Modules (#75)

* feat(homer): Add initial makeTagDirectory

* feat(homer): Add initial findPeaks module

* feat(homer): Update with new options

See 1d30e2c21a

* fix(homer): Correct findpeaks process name

* fix(homer): Takes a bam file instead of bed

* feat(homer): Add initial makeTagDirectory test

* fix(homer): Hardcode genome and configureHomer

I'd like to modularize configureHomer, but I need to figure out how
exactly the genomes work.

* fix(homer): bam => bed

Bam requires samtools to be present, which it's not in this docker image

* feat(homer): Add initial configureHomer script

* ci(homer): Add initial test

* test(homer): Reproducible configuration workaround

- I can't run both tests(one file and two files) at the same time because it breaks
- I can't copy the genome stuff from the configurehomer module because it's read only
- So I can't make the makeTagDirectory module depend on configureHomer

* test(homer): Add placeholder annotatepeaks

The required inputs are necessarily required for all workflows from what
I've used, but I'll need to look at the actual docs

* test(homer): Add missing B.bed

* test(homer): Rename two => groseq

Then all of the various workflows that homer provides can be e2e tested

* feat(homer): Add initial makeUCSCfile module

* test(homer): Add start to makeUCSCfile testing

* chore(homer): Add various cleanups

* test(homer): Rewrite annotatepeaks

Not passing yet

* test(homer): Rewrite configurehomer

* test(homer): Rewrite findpeaks

Still failing

* test(homer): Rewrite makeucscfile

Not passing yet

* test(homer): Rewrite maketagdirectory

All homer modules now follow the new structure. Time to make them pass.

* test(homer): Fix typo for workflow name

* fix(homer): Use correct container

* fix(homer): Accept fasta in maketagdirectory

Apparently all of the homer stuff can just take any old fasta and you
don't need to configure the genome ahead of time with configureHomer

* test(homer): makeTagDirectory passes now

* fix(homer): Update containers in makeucscfile

* test(homer): Rewrite makeucscfile

Takes input from maketagdirectory which is how the module should be used

* fix(homer): Update makeUCSCFile bedgraph path

* test(homer): Update makeucscfile expected output

* fix(homer): Update containers in findpeaks

* fix(homer): Change findpeaks args

The user is just going to have to know what they're doing for now

* test(homer): findPeaks rewrite with tagDir input

* test(homer): Update expected files for findPeaks

And bump filters

* style: Appease editorconfig

* ci: Remove old workflow

* tests(homer): Add md5sums

* test(homer): Add meta test

* style(homer): Capitalize HOMER

* docs(homer): Add maketagdirectory meta.yml

* docs(homer): Add makeucscfile meta.yml

* docs(homer): Add findpeaks meta.yml

* test(homer): Update to new test data standards

* chore: Remove stuff that got revived in the rebase

* chore: software => modules

* test(homer): Update tags

* test(homer): Update annotatepeaks

* ci: Fix uploading of artifacts

GitHub actions doesn't like the / in the tags

* test(homer): Remove annotate md5sum

This is failing and breaking new tests

* test(homer): Use bams instead of beds

* test(homer): Fix meta maketagdirectory

* test(homer): Fix input in all tests

* test(homer): Move back to bed files

Forgot samtools isn't present

* chore(homer): Add TODOs for tests

* test(homer): Add bed format arg

* test(homer): Update md5sums

* test(homer): Fix tags tsvs

* style(homer): Appease nf-core linting

* docs(homer): Be in line with what is in the main.nf file

Co-authored-by: Kevin Menden <kevin.menden@live.com>

Co-authored-by: Kevin Menden <kevin.menden@live.com>
This commit is contained in:
Edmund Miller 2021-09-08 15:40:34 +00:00 committed by GitHub
parent 0732028e15
commit 669fb5caed
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
18 changed files with 575 additions and 3 deletions

View file

@ -95,7 +95,7 @@ jobs:
if: failure()
uses: actions/upload-artifact@v2
with:
name: logs-${{ matrix.tags }}-${{ matrix.profile }}-${{ matrix.nxf_version }}
name: logs-${{ matrix.profile }}-${{ matrix.nxf_version }}
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out

View file

@ -1,11 +1,11 @@
name: homer_annotatepeaks
description: Annotate peaks with homer
description: Annotate peaks with HOMER suite
keywords:
- annotations
- peaks
- bed
tools:
- cuatadapt:
- homer:
description: |
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
documentation: http://homer.ucsd.edu/homer/

View file

@ -0,0 +1,68 @@
//
// Utility functions used in nf-core DSL2 module files
//
//
// Extract name of software tool from process name using $task.process
//
def getSoftwareName(task_process) {
return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()
}
//
// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules
//
def initOptions(Map args) {
def Map options = [:]
options.args = args.args ?: ''
options.args2 = args.args2 ?: ''
options.args3 = args.args3 ?: ''
options.publish_by_meta = args.publish_by_meta ?: []
options.publish_dir = args.publish_dir ?: ''
options.publish_files = args.publish_files
options.suffix = args.suffix ?: ''
return options
}
//
// Tidy up and join elements of a list to return a path string
//
def getPathFromList(path_list) {
def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries
paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes
return paths.join('/')
}
//
// Function to save/publish module results
//
def saveFiles(Map args) {
if (!args.filename.endsWith('.version.txt')) {
def ioptions = initOptions(args.options)
def path_list = [ ioptions.publish_dir ?: args.publish_dir ]
if (ioptions.publish_by_meta) {
def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta
for (key in key_list) {
if (args.meta && key instanceof String) {
def path = key
if (args.meta.containsKey(key)) {
path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key]
}
path = path instanceof String ? path : ''
path_list.add(path)
}
}
}
if (ioptions.publish_files instanceof Map) {
for (ext in ioptions.publish_files) {
if (args.filename.endsWith(ext.key)) {
def ext_list = path_list.collect()
ext_list.add(ext.value)
return "${getPathFromList(ext_list)}/$args.filename"
}
}
} else if (ioptions.publish_files == null) {
return "${getPathFromList(path_list)}/$args.filename"
}
}
}

View file

@ -0,0 +1,42 @@
// Import generic module functions
include { initOptions; saveFiles; getSoftwareName } from './functions'
params.options = [:]
def options = initOptions(params.options)
def VERSION = '4.11'
process HOMER_FINDPEAKS {
tag "$meta.id"
label 'process_medium'
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) }
conda (params.enable_conda ? "bioconda::homer=4.11=pl526hc9558a2_3" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/homer:4.11--pl526hc9558a2_3"
} else {
container "quay.io/biocontainers/homer:4.11--pl526hc9558a2_3"
}
input:
tuple val(meta), path(tagDir)
output:
tuple val(meta), path("*peaks.txt"), emit: txt
path "*.version.txt" , emit: version
script:
def software = getSoftwareName(task.process)
def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
"""
findPeaks \\
$tagDir \\
$options.args \\
-o ${prefix}.peaks.txt
echo $VERSION > ${software}.version.txt
"""
}

View file

@ -0,0 +1,37 @@
name: homer_findpeaks
description: Find peaks with HOMER suite
keywords:
- annotations
- peaks
tools:
- homer:
description: |
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
documentation: http://homer.ucsd.edu/homer/
doi: 10.1016/j.molcel.2010.05.004.
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- tagDir:
type: directory
description: "The 'Tag Directory'"
pattern: "tagDir"
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- peaks:
type: file
description: The found peaks
pattern: "*peaks.txt"
- version:
type: file
description: File containing software version
pattern: "*.{version.txt}"
authors:
- "@EMiller88"

View file

@ -0,0 +1,68 @@
//
// Utility functions used in nf-core DSL2 module files
//
//
// Extract name of software tool from process name using $task.process
//
def getSoftwareName(task_process) {
return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()
}
//
// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules
//
def initOptions(Map args) {
def Map options = [:]
options.args = args.args ?: ''
options.args2 = args.args2 ?: ''
options.args3 = args.args3 ?: ''
options.publish_by_meta = args.publish_by_meta ?: []
options.publish_dir = args.publish_dir ?: ''
options.publish_files = args.publish_files
options.suffix = args.suffix ?: ''
return options
}
//
// Tidy up and join elements of a list to return a path string
//
def getPathFromList(path_list) {
def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries
paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes
return paths.join('/')
}
//
// Function to save/publish module results
//
def saveFiles(Map args) {
if (!args.filename.endsWith('.version.txt')) {
def ioptions = initOptions(args.options)
def path_list = [ ioptions.publish_dir ?: args.publish_dir ]
if (ioptions.publish_by_meta) {
def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta
for (key in key_list) {
if (args.meta && key instanceof String) {
def path = key
if (args.meta.containsKey(key)) {
path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key]
}
path = path instanceof String ? path : ''
path_list.add(path)
}
}
}
if (ioptions.publish_files instanceof Map) {
for (ext in ioptions.publish_files) {
if (args.filename.endsWith(ext.key)) {
def ext_list = path_list.collect()
ext_list.add(ext.value)
return "${getPathFromList(ext_list)}/$args.filename"
}
}
} else if (ioptions.publish_files == null) {
return "${getPathFromList(path_list)}/$args.filename"
}
}
}

View file

@ -0,0 +1,43 @@
// Import generic module functions
include { initOptions; saveFiles; getSoftwareName } from './functions'
params.options = [:]
def options = initOptions(params.options)
def VERSION = '4.11'
process HOMER_MAKETAGDIRECTORY {
tag "$meta.id"
label 'process_medium'
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) }
conda (params.enable_conda ? "bioconda::homer=4.11=pl526hc9558a2_3" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/homer:4.11--pl526hc9558a2_3"
} else {
container "quay.io/biocontainers/homer:4.11--pl526hc9558a2_3"
}
input:
tuple val(meta), path(bed)
path fasta
output:
tuple val(meta), path("tag_dir"), emit: tagdir
path "*.version.txt" , emit: version
script:
def software = getSoftwareName(task.process)
def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
"""
makeTagDirectory \\
tag_dir \\
$options.args \\
$bed \\
-genome $fasta
echo $VERSION > ${software}.version.txt
"""
}

View file

@ -0,0 +1,41 @@
name: homer_maketagdirectory
description: Create a tag directory with the HOMER suite
keywords:
- peaks
- bed
tools:
- homer:
description: |
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
documentation: http://homer.ucsd.edu/homer/
doi: 10.1016/j.molcel.2010.05.004.
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bed:
type: file
description: The peak files in bed format
pattern: "*.bed"
- fasta:
type: file
description: Fasta file of reference genome
pattern: "*.fasta"
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- tag_dir:
type: directory
description: The "Tag Directory"
pattern: "tag_dir"
- version:
type: file
description: File containing software version
pattern: "*.{version.txt}"
authors:
- "@EMiller88"

View file

@ -0,0 +1,68 @@
//
// Utility functions used in nf-core DSL2 module files
//
//
// Extract name of software tool from process name using $task.process
//
def getSoftwareName(task_process) {
return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()
}
//
// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules
//
def initOptions(Map args) {
def Map options = [:]
options.args = args.args ?: ''
options.args2 = args.args2 ?: ''
options.args3 = args.args3 ?: ''
options.publish_by_meta = args.publish_by_meta ?: []
options.publish_dir = args.publish_dir ?: ''
options.publish_files = args.publish_files
options.suffix = args.suffix ?: ''
return options
}
//
// Tidy up and join elements of a list to return a path string
//
def getPathFromList(path_list) {
def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries
paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes
return paths.join('/')
}
//
// Function to save/publish module results
//
def saveFiles(Map args) {
if (!args.filename.endsWith('.version.txt')) {
def ioptions = initOptions(args.options)
def path_list = [ ioptions.publish_dir ?: args.publish_dir ]
if (ioptions.publish_by_meta) {
def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta
for (key in key_list) {
if (args.meta && key instanceof String) {
def path = key
if (args.meta.containsKey(key)) {
path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key]
}
path = path instanceof String ? path : ''
path_list.add(path)
}
}
}
if (ioptions.publish_files instanceof Map) {
for (ext in ioptions.publish_files) {
if (args.filename.endsWith(ext.key)) {
def ext_list = path_list.collect()
ext_list.add(ext.value)
return "${getPathFromList(ext_list)}/$args.filename"
}
}
} else if (ioptions.publish_files == null) {
return "${getPathFromList(path_list)}/$args.filename"
}
}
}

View file

@ -0,0 +1,41 @@
// Import generic module functions
include { initOptions; saveFiles; getSoftwareName } from './functions'
params.options = [:]
def options = initOptions(params.options)
def VERSION = '4.11'
process HOMER_MAKEUCSCFILE {
tag "$meta.id"
label 'process_medium'
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) }
conda (params.enable_conda ? "bioconda::homer=4.11=pl526hc9558a2_3" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/homer:4.11--pl526hc9558a2_3"
} else {
container "quay.io/biocontainers/homer:4.11--pl526hc9558a2_3"
}
input:
tuple val(meta), path(tagDir)
output:
tuple val(meta), path("tag_dir/*ucsc.bedGraph.gz"), emit: bedGraph
path "*.version.txt" , emit: version
script:
def software = getSoftwareName(task.process)
def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}"
"""
makeUCSCfile \\
$tagDir \\
-o auto
$options.args
echo $VERSION > ${software}.version.txt
"""
}

View file

@ -0,0 +1,38 @@
name: homer_makeucscfile
description: Create a UCSC bed graph with the HOMER suite
keywords:
- peaks
- bed
- bedGraph
tools:
- homer:
description: |
HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis.
documentation: http://homer.ucsd.edu/homer/
doi: 10.1016/j.molcel.2010.05.004.
input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- tagDir:
type: directory
description: "The 'Tag Directory'"
pattern: "tagDir"
output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
- bedGraph:
type: file
description: The UCSC bed graph
pattern: "tag_dir/*ucsc.bedGraph.gz"
- version:
type: file
description: File containing software version
pattern: "*.{version.txt}"
authors:
- "@EMiller88"

View file

@ -421,6 +421,18 @@ homer/annotatepeaks:
- modules/homer/annotatepeaks/**
- tests/modules/homer/annotatepeaks/**
homer/findpeaks:
- modules/homer/findpeaks/**
- tests/modules/homer/findpeaks/**
homer/maketagdirectory:
- modules/homer/maketagdirectory/**
- tests/modules/homer/maketagdirectory/**
homer/makeucscfile:
- modules/homer/makeucscfile/**
- tests/modules/homer/makeucscfile/**
iqtree:
- modules/iqtree/**
- tests/modules/iqtree/**

View file

@ -0,0 +1,17 @@
#!/usr/bin/env nextflow
nextflow.enable.dsl = 2
include { HOMER_MAKETAGDIRECTORY } from '../../../../modules/homer/maketagdirectory/main.nf' addParams( options: [args: '-format bed'] )
include { HOMER_FINDPEAKS } from '../../../../modules/homer/findpeaks/main.nf' addParams( options: [args: '-style factor'] )
workflow test_homer_findpeaks {
input = [[id:'test'],
[file(params.test_data['sarscov2']['genome']['test_bed'], checkIfExists: true),
file(params.test_data['sarscov2']['genome']['test2_bed'], checkIfExists: true)]]
fasta = file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true)
HOMER_MAKETAGDIRECTORY (input, fasta)
HOMER_FINDPEAKS ( HOMER_MAKETAGDIRECTORY.out.tagdir )
}

View file

@ -0,0 +1,8 @@
- name: homer findpeaks
command: nextflow run ./tests/modules/homer/findpeaks -entry test_homer_findpeaks -c tests/config/nextflow.config
tags:
- homer
- homer/findpeaks
files:
- path: output/homer/test.peaks.txt
md5sum: f75ac1fea67f1e307a1ad4d059a9b6cc

View file

@ -0,0 +1,32 @@
#!/usr/bin/env nextflow
nextflow.enable.dsl = 2
include { HOMER_MAKETAGDIRECTORY } from '../../../../modules/homer/maketagdirectory/main.nf' addParams( options: [args: '-format bed'] )
workflow test_homer_maketagdirectory {
input = [[id:'test'],
[file(params.test_data['sarscov2']['genome']['test_bed'], checkIfExists: true),
file(params.test_data['sarscov2']['genome']['test2_bed'], checkIfExists: true)]]
fasta = file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true)
HOMER_MAKETAGDIRECTORY (input, fasta)
}
workflow test_homer_meta_maketagdirectory {
input =
[[[ id:'test1'],
[file(params.test_data['sarscov2']['genome']['test_bed'], checkIfExists: true)]],
[[ id:'test2'],
[file(params.test_data['sarscov2']['genome']['test2_bed'], checkIfExists: true)]]]
fasta = file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true)
meta_input = [[id: 'meta_test']] + [ input.collect{it[1]}.flatten() ]
HOMER_MAKETAGDIRECTORY (meta_input, fasta)
}
// TODO Make a failing bam test
// TODO Make a pass bam test that feeds the bam through samtools first

View file

@ -0,0 +1,33 @@
- name: homer maketagdirectory
command: nextflow run ./tests/modules/homer/maketagdirectory -entry test_homer_maketagdirectory -c tests/config/nextflow.config
tags:
- homer
- homer/maketagdirectory
files:
- path: output/homer/tag_dir/MT192765.1.tags.tsv
md5sum: e29522171ca2169b57396495f8b97485
- path: output/homer/tag_dir/tagAutocorrelation.txt
md5sum: 62b107c4971b94126fb89a0bc2800455
- path: output/homer/tag_dir/tagCountDistribution.txt
md5sum: fd4ee7ce7c5dfd7c9d739534b8180578
- path: output/homer/tag_dir/tagInfo.txt
md5sum: 816baa642c946f8284eaa465638e9abb
- path: output/homer/tag_dir/tagLengthDistribution.txt
md5sum: e5aa2b9843ca9c04ace297280aed6af4
- name: homer meta maketagdirectory
command: nextflow run ./tests/modules/homer/maketagdirectory -entry test_homer_meta_maketagdirectory -c tests/config/nextflow.config
tags:
- homer
- homer/maketagdirectory
files:
- path: output/homer/tag_dir/MT192765.1.tags.tsv
md5sum: e29522171ca2169b57396495f8b97485
- path: output/homer/tag_dir/tagAutocorrelation.txt
md5sum: 62b107c4971b94126fb89a0bc2800455
- path: output/homer/tag_dir/tagCountDistribution.txt
md5sum: fd4ee7ce7c5dfd7c9d739534b8180578
- path: output/homer/tag_dir/tagInfo.txt
md5sum: 816baa642c946f8284eaa465638e9abb
- path: output/homer/tag_dir/tagLengthDistribution.txt
md5sum: e5aa2b9843ca9c04ace297280aed6af4

View file

@ -0,0 +1,17 @@
#!/usr/bin/env nextflow
nextflow.enable.dsl = 2
include { HOMER_MAKETAGDIRECTORY } from '../../../../modules/homer/maketagdirectory/main.nf' addParams( options: [args: '-format bed'] )
include { HOMER_MAKEUCSCFILE } from '../../../../modules/homer/makeucscfile/main.nf' addParams( options: [:] )
workflow test_homer_makeucscfile {
input = [[id:'test'],
[file(params.test_data['sarscov2']['genome']['test_bed'], checkIfExists: true),
file(params.test_data['sarscov2']['genome']['test2_bed'], checkIfExists: true)]]
fasta = file(params.test_data['sarscov2']['genome']['genome_fasta'], checkIfExists: true)
HOMER_MAKETAGDIRECTORY (input, fasta)
HOMER_MAKEUCSCFILE ( HOMER_MAKETAGDIRECTORY.out.tagdir )
}

View file

@ -0,0 +1,7 @@
- name: homer makeucscfile
command: nextflow run ./tests/modules/homer/makeucscfile -entry test_homer_makeucscfile -c tests/config/nextflow.config
tags:
- homer
- homer/makeucscfile
files:
- path: output/homer/tag_dir/tag_dir.ucsc.bedGraph.gz