mirror of
https://github.com/MillironX/nf-core_modules.git
synced 2024-11-11 04:33:10 +00:00
4566525da2
* initial data restructuing * fixed bedtools_complement * fixed bedtools_genomecov * fixed bedtools_getfasta * fixed bedtools_intersect * fixed bedtools maskfasta * fixed bedtools_merge * fixed bedtools_slop * fixed bedtools_sort * fixed bismark_genome_preparation * fixed blast * fixed bowtie data * fixed bowtie2 data * fixed bwa data * fixed bwamem2 data usage * fixed cat_fastq data * fixed cutadapt data * fixed dsh data * fixed fastp data * fixed fastqc; fixed bug with wrong fastq format * fixed gatk * fixed data for gffread, gunzip * fixed ivar paths * fixed data paths for minimap2 * fixed mosdepth * fixed multiqc, pangolin * fixed picard data paths * fixed data paths for qualimap, quast * fixed salmon data paths * fixed samtools paths * fixed seqwish, stringtie paths * fixed tabix, trimgalore paths * cleaned up data * added first description to README * changed test data naming again; everything up to bwa fixed * everything up to gatk4 * fixed everything up to ivar * fixed everything up to picard * everything up to quast * everything fixed up to stringtie * switched everyting to 'test' naming scheme * fixed samtools and ivar tests * cleaned up README a bit * add (simulated) methylation test data based on SARS-CoV-2 genome; simulated with Sherman --non_dir --genome sarscov2/fasta/ --paired -n 10000 -l 100 --CG 20 --CH 90 * bwameth/align: update data paths and checksums also, build index on the go * bwameth/index: update data paths and checksums * methyldackel/extract: update data paths and checksums * methyldackel/mbias: update data paths and checksums * bismark/deduplicate: update data paths and checksums * remove obsolete testdata * remove empty 'dummy_file.txt' * update data/README.md * methyldackel: fix test * Revert "methyldackel: fix test" This reverts commit f175a32d144b1b0bfa0c6885da80c51e3cfe038a. * methyldackel: fix test for real * move test.genome.sizes * changed test names * switched genomic to genome and transcriptome * fix bedtools, blast * fix gtf, tabix, .paf * fix bowtie,bwa,bwameth * fixed: bwa, bwamem, gatk, gffread, quast * fixed bismark and blast * fixed remaining tests * delete bam file Co-authored-by: phue <patrick.huether@gmail.com>
2.8 KiB
2.8 KiB
Modules Test Data
This directory contains all data used for the individual module tests. It is currently organised in genomics
and generic
. The former contains all typical data required for genomics modules, such as fasta, fastq and bam files. Every folder in genomics
corresponds to a single organisms. Any other data is stored in generic
. This contains files that currently cannot be associated to a genomics category, but also depreciated files which will be removed in the future and exchanged by files in genomics
.
When adding a new module, please check carefully whether the data necessary for the tests exists already in tests/data/genomics
. If you can't find the data, please ask about it in the slack #modules channel.
Data Description
genomics
- sarscov2
- bam:
- 'test_{,methylated}_paired_end.bam': sarscov2 sequencing reads aligned against test_genomic.fasta using minimap2
- 'test_{,methylated}_paired_end.sorted.bam': sorted version of the above bam file
- 'test_{,methylated}_paired_end.bam.sorted.bam.bai': bam index for the sorted bam file
- 'test_single_end.bam': alignment (unsorted) of the 'test_1.fastq.gz' reads against test_genomic.fasta using minimap2
- bed
- 'test.bed': exemplary bed file for the MT192765.1 genome (fasta/test_genomic.fasta)
- 'test.2.bed': slightly modified copy of the above file
- 'test.bed.gz': gzipped version
- 'test.genome.sizes': genome size for the MT192765.1 genome
- fasta
- 'test_genomic.fasta': MT192765.1 genomem including (GCA_011545545.1_ASM1154554v1)
- 'test_genomic.dict': GATK dict for 'test_genomic.fasta'
- 'test_genomic.fasta.fai': fasta index for 'test_genomic.fasta'
- 'test_cds_from_genomic.fasta': coding sequencing from MT192765.1 genome (transcripts)
- fastq
- 'test_{1,2}.fastq.gz' sarscov2 paired-end sequencing reads
- 'test_{1,2}.2.fastq.gz‘: copies of the above reads
- 'test_methylated_{1,2}.fastq.gz' sarscov2 paired-end bisulfite sequencing reads (generated with Sherman)
- gtf
- 'test_genomic.gtf': GTF for MT192765.1 genome
- 'test_genomic.gff3': GFF for MT192765.1 genome
- 'test_genomic.gff3.gz': bgzipped-version
- paf
- 'test_cds_from_genomic.paf': PAF file for MT192765.1 genome
- bam:
generic
- 'a.gff3.gz': bgzipped gff3 file currently necessary for TABIX test
- bedgraph: bedgraph files for seacr
- fasta: additional fasta file currently necessary for STAR
- fastq: additional fastq files currently necessary for STAR
- gtf: additional gtf file for STAR
- vcf: several VCF files for tools using those, will be removed in the future
- 'test.txt.gar.gz' exemplary tar file for the untar module