\ No newline at end of file
diff --git a/software/fastqc/test/output/test_R1_fastqc.zip b/software/fastqc/test/output/test_R1_fastqc.zip
new file mode 100644
index 00000000..05358336
Binary files /dev/null and b/software/fastqc/test/output/test_R1_fastqc.zip differ
diff --git a/software/fastqc/test/output/test_R1_val_1_fastqc.html b/software/fastqc/test/output/test_R1_val_1_fastqc.html
new file mode 100644
index 00000000..45c60031
--- /dev/null
+++ b/software/fastqc/test/output/test_R1_val_1_fastqc.html
@@ -0,0 +1,187 @@
+test_R1_val_1.fq.gz FastQC Report
\ No newline at end of file
diff --git a/software/fastqc/test/output/test_R1_val_1_fastqc.zip b/software/fastqc/test/output/test_R1_val_1_fastqc.zip
new file mode 100644
index 00000000..f59827d2
Binary files /dev/null and b/software/fastqc/test/output/test_R1_val_1_fastqc.zip differ
diff --git a/software/fastqc/test/output/test_R2_fastqc.html b/software/fastqc/test/output/test_R2_fastqc.html
new file mode 100644
index 00000000..ff3435d8
--- /dev/null
+++ b/software/fastqc/test/output/test_R2_fastqc.html
@@ -0,0 +1,187 @@
+test_R2.fastq.gz FastQC Report
\ No newline at end of file
diff --git a/software/fastqc/test/output/test_R2_fastqc.zip b/software/fastqc/test/output/test_R2_fastqc.zip
new file mode 100644
index 00000000..dcfa2eab
Binary files /dev/null and b/software/fastqc/test/output/test_R2_fastqc.zip differ
diff --git a/software/fastqc/test/output/test_R2_val_2_fastqc.html b/software/fastqc/test/output/test_R2_val_2_fastqc.html
new file mode 100644
index 00000000..1dc83b81
--- /dev/null
+++ b/software/fastqc/test/output/test_R2_val_2_fastqc.html
@@ -0,0 +1,187 @@
+test_R2_val_2.fq.gz FastQC Report
+ MultiQC: Summarize analysis results for multiple tools and samples in a single report
+ Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
+ Bioinformatics (2016)
+ doi: 10.1093/bioinformatics/btw354
+ PMID: 27312411
+
+ A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
+
+
+
+
+
+
+
+
+
+
Loading report..
+
+
+
Report generated on 2020-03-18, 10:46 based on data in:
+ /bi/home/fkrueger/VersionControl/nf-core-modules/tools/multiqc/test/work/52/07836c4fe43e822e375798bf42c0e4
+
+
Uncheck the tick box to hide columns. Click and drag the handle on the left to change order.
+
+
+
+
+
+
+
+
Sort
+
Visible
+
Group
+
Column
+
Description
+
ID
+
Scale
+
+
+
+
+
+
||
+
+
+
+
Bowtie 2
+
% Aligned
+
overall alignment rate
+
overall_alignment_rate
+
None
+
+
+
||
+
+
+
+
Cutadapt
+
% Trimmed
+
% Total Base Pairs trimmed
+
percent_trimmed
+
None
+
+
+
||
+
+
+
+
FastQC
+
% Dups
+
% Duplicate Reads
+
percent_duplicates
+
None
+
+
+
||
+
+
+
+
FastQC
+
% GC
+
Average % GC Content
+
percent_gc
+
None
+
+
+
||
+
+
+
+
FastQC
+
Length
+
Average Sequence Length (bp)
+
avg_sequence_length
+
None
+
+
+
||
+
+
+
+
FastQC
+
% Failed
+
Percentage of modules failed in FastQC report (includes those not plotted here)
+
percent_fails
+
None
+
+
+
||
+
+
+
+
FastQC
+
M Seqs
+
Total Sequences (millions)
+
total_sequences
+
read_count
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Bowtie 2
+
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
+
+
+
+
+
+
+
+
+
This plot shows the number of reads aligning to the reference in different ways. Please note that single mate alignment counts are halved to tally with pair counts properly.
+
+
+
+
There are 6 possible types of alignment:
+* PE mapped uniquely: Pair has only one occurence in the reference genome.
+* PE mapped discordantly uniquely: Pair has only one occurence but not in proper pair.
+* PE one mate mapped uniquely: One read of a pair has one occurence.
+* PE multimapped: Pair has multiple occurence.
+* PE one mate multimapped: One read of a pair has multiple occurence.
+* PE neither mate aligned: Pair has no occurence.
+
+
+
+
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+
+
Cutadapt
+
Cutadapt is a tool to find and remove adapter sequences, primers, poly-Atails and other types of unwanted sequence from your high-throughput sequencing reads.
+
+
+
+
+
+
+
This plot shows the number of reads with certain lengths of adapter trimmed.
+ Obs/Exp shows the raw counts divided by the number expected due to sequencing errors. A defined peak
+ may be related to adapter length. See the
+ cutadapt documentation
+ for more information on how these numbers are generated.
+
+
+
+
+
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+
+
FastQ Screen
+
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
FastQC
+
FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.
+
+
+
+
+
+
+
+ Sequence Counts
+
+
+
+
+
+
Sequence counts for each sample. Duplicate read counts are an estimate only.
+
+
+
+
This plot show the total number of reads, broken down into unique and duplicate
+if possible (only more recent versions of FastQC give duplicate info).
+
You can read more about duplicate calculation in the
+FastQC documentation.
+A small part has been copied here for convenience:
+
Only sequences which first appear in the first 100,000 sequences
+in each file are analysed. This should be enough to get a good impression
+for the duplication levels in the whole file. Each sequence is tracked to
+the end of the file to give a representative count of the overall duplication level.
+
The duplication detection requires an exact sequence match over the whole length of
+the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.
+
+
+
+
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+ Sequence Quality Histograms
+
+
+
+
+
+
The mean quality value across each base position in the read.
+
+
+
+
To enable multiple samples to be plotted on the same graph, only the mean quality
+scores are plotted (unlike the box plots seen in FastQC reports).
The y-axis on the graph shows the quality scores. The higher the score, the better
+the base call. The background of the graph divides the y axis into very good quality
+calls (green), calls of reasonable quality (orange), and calls of poor quality (red).
+The quality of calls on most platforms will degrade as the run progresses, so it is
+common to see base calls falling into the orange area towards the end of a read.
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+ Per Sequence Quality Scores
+
+
+
+
+
+
The number of reads with average quality scores. Shows if a subset of reads has poor quality.
The per sequence quality score report allows you to see if a subset of your
+sequences have universally low quality values. It is often the case that a
+subset of sequences will have universally poor quality, however these should
+represent only a small percentage of the total sequences.
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+ Per Base Sequence Content
+
+
+
+
+
+
The proportion of each base position for which each of the four normal DNA bases has been called.
+
+
+
+
To enable multiple samples to be shown in a single plot, the base composition data
+is shown as a heatmap. The colours represent the balance between the four bases:
+an even distribution should give an even muddy brown colour. Hover over the plot
+to see the percentage of the four bases under the cursor.
+
To see the data as a line plot, as in the original FastQC graph, click on a sample track.
Per Base Sequence Content plots out the proportion of each base position in a
+file for which each of the four normal DNA bases has been called.
+
In a random library you would expect that there would be little to no difference
+between the different bases of a sequence run, so the lines in this plot should
+run parallel with each other. The relative amount of each base should reflect
+the overall amount of these bases in your genome, but in any case they should
+not be hugely imbalanced from each other.
+
It's worth noting that some types of library will always produce biased sequence
+composition, normally at the start of the read. Libraries produced by priming
+using random hexamers (including nearly all RNA-Seq libraries) and those which
+were fragmented using transposases inherit an intrinsic bias in the positions
+at which reads start. This bias does not concern an absolute sequence, but instead
+provides enrichement of a number of different K-mers at the 5' end of the reads.
+Whilst this is a true technical bias, it isn't something which can be corrected
+by trimming and in most cases doesn't seem to adversely affect the downstream
+analysis.
+
+
+
+
+
+
+ Click a sample row to see a line plot for that dataset.
+
+
Rollover for sample name
+
+
+ Position: -
+
%T: -
+
%C: -
+
%A: -
+
%G: -
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Per Sequence GC Content
+
+
+
+
+
+
The average GC content of reads. Normal random library typically have a
+ roughly normal distribution of GC content.
This module measures the GC content across the whole length of each sequence
+in a file and compares it to a modelled normal distribution of GC content.
+
In a normal random library you would expect to see a roughly normal distribution
+of GC content where the central peak corresponds to the overall GC content of
+the underlying genome. Since we don't know the the GC content of the genome the
+modal GC content is calculated from the observed data and used to build a
+reference distribution.
+
An unusually shaped distribution could indicate a contaminated library or
+some other kinds of biased subset. A normal distribution which is shifted
+indicates some systematic bias which is independent of base position. If there
+is a systematic bias which creates a shifted normal distribution then this won't
+be flagged as an error by the module since it doesn't know what your genome's
+GC content should be.
+
+
+
+
+
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+ Per Base N Content
+
+
+
+
+
+
The percentage of base calls at each position for which an N was called.
If a sequencer is unable to make a base call with sufficient confidence then it will
+normally substitute an N rather than a conventional base call. This graph shows the
+percentage of base calls at each position for which an N was called.
+
It's not unusual to see a very low proportion of Ns appearing in a sequence, especially
+nearer the end of a sequence. However, if this proportion rises above a few percent
+it suggests that the analysis pipeline was unable to interpret the data well enough to
+make valid base calls.
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+ Sequence Length Distribution
+
+
+
+
The distribution of fragment sizes (read lengths) found.
+ See the FastQC help
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+ Sequence Duplication Levels
+
+
+
+
+
+
The relative level of duplication found for every sequence.
In a diverse library most sequences will occur only once in the final set.
+A low level of duplication may indicate a very high level of coverage of the
+target sequence, but a high level of duplication is more likely to indicate
+some kind of enrichment bias (eg PCR over amplification). This graph shows
+the degree of duplication for every sequence in a library: the relative
+number of sequences with different degrees of duplication.
+
Only sequences which first appear in the first 100,000 sequences
+in each file are analysed. This should be enough to get a good impression
+for the duplication levels in the whole file. Each sequence is tracked to
+the end of the file to give a representative count of the overall duplication level.
+
The duplication detection requires an exact sequence match over the whole length of
+the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.
+
In a properly diverse library most sequences should fall into the far left of the
+plot in both the red and blue lines. A general level of enrichment, indicating broad
+oversequencing in the library will tend to flatten the lines, lowering the low end
+and generally raising other categories. More specific enrichments of subsets, or
+the presence of low complexity contaminants will tend to produce spikes towards the
+right of the plot.
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+ Overrepresented sequences
+
+
+
+
+
+
The total amount of overrepresented sequences found in each library.
+
+
+
+
FastQC calculates and lists overrepresented sequences in FastQ files. It would not be
+possible to show this for all samples in a MultiQC report, so instead this plot shows
+the number of sequences categorized as over represented.
+
Sometimes, a single sequence may account for a large number of reads in a dataset.
+To show this, the bars are split into two: the first shows the overrepresented reads
+that come from the single most common sequence. The second shows the total count
+from all remaining overrepresented sequences.
A normal high-throughput library will contain a diverse set of sequences, with no
+individual sequence making up a tiny fraction of the whole. Finding that a single
+sequence is very overrepresented in the set either means that it is highly biologically
+significant, or indicates that the library is contaminated, or not as diverse as you expected.
+
FastQC lists all of the sequences which make up more than 0.1% of the total.
+To conserve memory only sequences which appear in the first 100,000 sequences are tracked
+to the end of the file. It is therefore possible that a sequence which is overrepresented
+but doesn't appear at the start of the file for some reason could be missed by this module.
+
+
+
4 samples had less than 1% of reads made up of overrepresented sequences
+
+
+
+
+
+
+
+
+
+
+ Adapter Content
+
+
+
+
+
+
The cumulative percentage count of the proportion of your
+ library which has seen each of the adapter sequences at each position.
+
+
+
+
Note that only samples with ≥ 0.1% adapter contamination are shown.
+
There may be several lines per sample, as one is shown for each adapter
+detected in the file.
The plot shows a cumulative percentage count of the proportion
+of your library which has seen each of the adapter sequences at each position.
+Once a sequence has been seen in a read it is counted as being present
+right through to the end of the read so the percentages you see will only
+increase as the read length goes on.
+
+
+
loading..
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Plot Table Data
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Regex Help
+
+
+
Toolbox search strings can behave as regular expressions (regexes). Click a button below to see an example of it in action. Try modifying them yourself in the text box.