millironx/cowcalf-rumen-metagenomic-pipeline

mirror of https://github.com/MillironX/cowcalf-rumen-metagenomic-pipeline.git synced 2024-12-21 08:58:17 +00:00

An end-to-end script to convert Illumina shotgun sequences and metadata into full-blown diversity tables and visualizations.

beef-cattle metagenomics

Find a file

Thomas A. Christensen II 0c1ba0470b Basic cleanup		2021-06-02 08:29:01 -05:00
.vscode	Fixed all shellcheck issues	2019-05-21 10:28:18 -06:00
.gitignore	Fixed gitignore	2019-05-16 23:31:51 -06:00
fastq-to-taxonomy.sh	Basic cleanup	2021-06-02 08:29:01 -05:00
fetchmetadata.R	Initial commit	2019-05-13 10:09:48 -06:00
LICENSE	Create LICENSE	2021-04-21 11:34:41 -06:00
main.sh	Basic cleanup	2021-06-02 08:29:01 -05:00
manipulatefeaturetable.R	Basic cleanup	2021-06-02 08:29:01 -05:00
README.md	Correct rarefaction instructions in README	2021-04-21 12:34:59 -06:00
sample-classifier.sh	Basic cleanup	2021-06-02 08:29:01 -05:00
sample-regression.sh	Basic cleanup	2021-06-02 08:29:01 -05:00

README.md

Logo

Cow/calf Rumen Metagenomics Pipeline

An end-to-end script to convert Illumina shotgun sequences and metadata into full-blown diversity tables and visualizations. Of course, it's focused on the rumen and dam/calf relationships, but is widely applicable to other systems.

Written entirely during Spring Semester 2019 for work done in Dr. Hannah Cunningham-Hollinger's lab at the University of Wyoming, computed on UW's ARCC High-performance servers and presented as a poster at the Western Section American Association of Animal Science annual meeting.

Prerequisites

You will need access to the following commands/programs:

metaxa2, metaxa2_ttt, metaxa2_dc (Metaxa2)
Rscript (R)
source activate (Miniconda)
qiime, biom (Install within conda environment named qiime2)

If working on a HPC, contact your department to find out how to get access to these commands.

Usage

Clone the script files

git clone https://github.com/MillironX/cowcalf-rumen-metagenomic-pipeline.git

Create a directory with all forward- and reverse- read files in it, named as <SAMPLEID>_R1_001.fastq.gz for forward-reads and <SAMPLEID>_R2_001.fastq.gz for reverse-reads. Add a QIIME2-compatible metadata file named metadata.tsv, and copy all of the code files into it. It should look like

.
├── sample1_R1_001.fastq.gz
├── sample1_R2_001.fastq.gz
├── sample2_R1_001.fastq.gz
├── sample2_R2_001.fastq.gz
├── ...
├── sampleN_R1_001.fastq.gz
├── sampleN_R2_001.fastq.gz
├── metadata.tsv
├── main.sh
├── fastq-to-taxonomy.sh
├── manipulatefeaturetable.R
├── fetchmetadata.R
├── sample-classifier.sh
└── sample-regression.sh

With Slurm

These scripts are preconfigured for use with Slurm and Lmod. Everything is very basic, and should work on any Slurm configuration. Before use, be sure to replace the provided credentials with your own in main.sh, fastq-to-taxonomy.sh, sample-classifier.sh, and sample-regression.sh, then run

sbatch main.sh

Without Slurm

Edit main.sh and remove every call to srun (including its cli options), replace every instance of $SLURM_NTASKS with the number of parallel threads you wish to run, and comment out every line that starts module load. Then run

./main.sh

Future Work

This project is finished. It is meant to be a reference and an inspiration, but nothing more. I do not intend to update the code now (as embarrassing as it might be).

Known Issues

Miniconda now uses the conda activate command line instead of source activate

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Thomas A. Christensen II - @MillironX

Project Link: https://github.com/MillironX/cowcalf-rumen-metagenomic-pipline