nf-core_modules/tests/data/README.md
Kevin Menden 4566525da2
Converge test data usage (#249)
* initial data restructuing

* fixed bedtools_complement

* fixed bedtools_genomecov

* fixed bedtools_getfasta

* fixed bedtools_intersect

* fixed bedtools maskfasta

* fixed bedtools_merge

* fixed bedtools_slop

* fixed bedtools_sort

* fixed bismark_genome_preparation

* fixed blast

* fixed bowtie data

* fixed bowtie2 data

* fixed bwa data

* fixed bwamem2 data usage

* fixed cat_fastq data

* fixed cutadapt data

* fixed dsh data

* fixed fastp data

* fixed fastqc; fixed bug with wrong fastq format

* fixed gatk

* fixed data for gffread, gunzip

* fixed ivar paths

* fixed data paths for minimap2

* fixed mosdepth

* fixed multiqc, pangolin

* fixed picard data paths

* fixed data paths for qualimap, quast

* fixed salmon data paths

* fixed samtools paths

* fixed seqwish, stringtie paths

* fixed tabix, trimgalore paths

* cleaned up data

* added first description to README

* changed test data naming again; everything up to bwa fixed

* everything up to gatk4

* fixed everything up to ivar

* fixed everything up to picard

* everything up to quast

* everything fixed up to stringtie

* switched everyting to 'test' naming scheme

* fixed samtools and ivar tests

* cleaned up README a bit

* add (simulated) methylation test data

based on SARS-CoV-2 genome; simulated with Sherman --non_dir --genome sarscov2/fasta/ --paired -n 10000 -l 100 --CG 20 --CH 90

* bwameth/align: update data paths and checksums

also, build index on the go

* bwameth/index: update data paths and checksums

* methyldackel/extract: update data paths and checksums

* methyldackel/mbias: update data paths and checksums

* bismark/deduplicate: update data paths and checksums

* remove obsolete testdata

* remove empty 'dummy_file.txt'

* update data/README.md

* methyldackel: fix test

* Revert "methyldackel: fix test"

This reverts commit f175a32d144b1b0bfa0c6885da80c51e3cfe038a.

* methyldackel: fix test

for real

* move test.genome.sizes

* changed test names

* switched genomic to genome and transcriptome

* fix bedtools, blast

* fix gtf, tabix, .paf

* fix bowtie,bwa,bwameth

* fixed: bwa, bwamem, gatk, gffread, quast

* fixed bismark and blast

* fixed remaining tests

* delete bam file

Co-authored-by: phue <patrick.huether@gmail.com>
2021-03-04 10:10:57 +00:00

2.8 KiB
Raw Blame History

Modules Test Data

This directory contains all data used for the individual module tests. It is currently organised in genomics and generic. The former contains all typical data required for genomics modules, such as fasta, fastq and bam files. Every folder in genomics corresponds to a single organisms. Any other data is stored in generic. This contains files that currently cannot be associated to a genomics category, but also depreciated files which will be removed in the future and exchanged by files in genomics.

When adding a new module, please check carefully whether the data necessary for the tests exists already in tests/data/genomics. If you can't find the data, please ask about it in the slack #modules channel.

Data Description

genomics

  • sarscov2
    • bam:
      • 'test_{,methylated}_paired_end.bam': sarscov2 sequencing reads aligned against test_genomic.fasta using minimap2
      • 'test_{,methylated}_paired_end.sorted.bam': sorted version of the above bam file
      • 'test_{,methylated}_paired_end.bam.sorted.bam.bai': bam index for the sorted bam file
      • 'test_single_end.bam': alignment (unsorted) of the 'test_1.fastq.gz' reads against test_genomic.fasta using minimap2
    • bed
      • 'test.bed': exemplary bed file for the MT192765.1 genome (fasta/test_genomic.fasta)
      • 'test.2.bed': slightly modified copy of the above file
      • 'test.bed.gz': gzipped version
      • 'test.genome.sizes': genome size for the MT192765.1 genome
    • fasta
      • 'test_genomic.fasta': MT192765.1 genomem including (GCA_011545545.1_ASM1154554v1)
      • 'test_genomic.dict': GATK dict for 'test_genomic.fasta'
      • 'test_genomic.fasta.fai': fasta index for 'test_genomic.fasta'
      • 'test_cds_from_genomic.fasta': coding sequencing from MT192765.1 genome (transcripts)
    • fastq
      • 'test_{1,2}.fastq.gz' sarscov2 paired-end sequencing reads
      • 'test_{1,2}.2.fastq.gz: copies of the above reads
      • 'test_methylated_{1,2}.fastq.gz' sarscov2 paired-end bisulfite sequencing reads (generated with Sherman)
    • gtf
      • 'test_genomic.gtf': GTF for MT192765.1 genome
      • 'test_genomic.gff3': GFF for MT192765.1 genome
      • 'test_genomic.gff3.gz': bgzipped-version
    • paf
      • 'test_cds_from_genomic.paf': PAF file for MT192765.1 genome

generic

  • 'a.gff3.gz': bgzipped gff3 file currently necessary for TABIX test
  • bedgraph: bedgraph files for seacr
  • fasta: additional fasta file currently necessary for STAR
  • fastq: additional fastq files currently necessary for STAR
  • gtf: additional gtf file for STAR
  • vcf: several VCF files for tools using those, will be removed in the future
  • 'test.txt.gar.gz' exemplary tar file for the untar module