nf-configs/docs/eddie.md

# nf-core/configs: Eddie Configuration

nf-core pipelines sarek, rnaseq, atacseq, and viralrecon have all been tested on the University of Edinburgh Eddie HPC. All except atacseq have pipeline-specific config files; atacseq does not yet support this.

## Getting help

There is a Slack channel dedicated to eddie users on the MRC IGC Slack: [https://igmm.slack.com/channels/eddie3](https://igmm.slack.com/channels/eddie3)

## Using the Eddie config profile

To use, run the pipeline with `-profile eddie` (one hyphen).
This will download and launch the [`eddie.config`](../conf/eddie.config) which has been pre-configured with a setup suitable for the [University of Edinburgh Eddie HPC](https://www.ed.ac.uk/information-services/research-support/research-computing/ecdf/high-performance-computing).

The configuration file supports running nf-core pipelines with Docker containers running under Singularity by default. Conda is not currently supported.

```bash
nextflow run nf-core/PIPELINE -profile eddie  # ...rest of pipeline flags
```

Before running the pipeline you will need to install Nextflow or load it from the module system. Generally the most recent version will be the one you want. If you want to run a Nextflow pipeline that is based on [DSL2](https://www.nextflow.io/docs/latest/dsl2.html), you will need a version that ends with '-edge'.

To list versions:

```bash
module avail igmm/apps/nextflow
```

To load the most recent version:

```bash
module load igmm/apps/nextflow
```

This config enables Nextflow to manage the pipeline jobs via the SGE job scheduler and using Singularity for software management.

## Singularity set-up

Load Singularity from the module system and, if you have access to `/exports/igmm/eddie/BioinformaticsResources`, set the Singularity cache directory to the BioinformaticsResources path below. If some containers for your pipeline run are not present, please contact the [IGC Data Manager](data.manager@igc.ed.ac.uk) to have them added. You can add these lines to the file `$HOME/.bashrc`, or you can run these commands before you run an nf-core pipeline.

If you do not have access to `/exports/igmm/eddie/BioinformaticsResources`, set the Singularity cache directory to somewhere sensible that is not in your `$HOME` area (which has limited space). It will take time to download all the Singularity containers, but you can use this again.

```bash
module load singularity
export NXF_SINGULARITY_CACHEDIR="/exports/igmm/eddie/BioinformaticsResources/nf-core/singularity-images"
```

Singularity will create a directory `.singularity` in your `$HOME` directory on eddie. Space on `$HOME` is very limited, so it is a good idea to create a directory somewhere else with more room and link the locations.

```bash
cd $HOME
mkdir /exports/eddie/path/to/my/area/.singularity
ln -s /exports/eddie/path/to/my/area/.singularity .singularity
```

## Running Nextflow

### On a login node

You can use a qlogin to run Nextflow, if you request more than the default 2GB of memory. Unfortunately you can't submit the initial Nextflow run process as a job as you can't qsub within a qsub.

```bash
qlogin -l h_vmem=8G
```

If your eddie terminal disconnects your Nextflow job will stop. You can run Nextflow as a bash script on the command line using `nohup` to prevent this.

```bash
nohup ./nextflow_run.sh &
```

### On a wild west node - IGC only

Wild west nodes on eddie can be accessed via ssh (node2c15, node2c16, node3g22). To run Nextflow on one of these nodes, do it within a [screen session](https://linuxize.com/post/how-to-use-linux-screen/).

Start a new screen session.

```bash
screen -S <session_name>
```

List existing screen sessions

```bash
screen -ls
```

Reconnect to an existing screen session

```bash
screen -r <session_name>
```

## Using iGenomes references

A local copy of the iGenomes resource has been made available on the Eddie HPC for those with access to `/exports/igmm/eddie/BioinformaticsResources` so you should be able to run the pipeline against any reference available in the `igenomes.config`.
You can do this by simply using the `--genome <GENOME_ID>` parameter.

## Adjusting maximum resources

This config is set for IGC standard nodes which have 32 cores and 384GB memory. If you are a non-IGC user, please see the [ECDF specification](https://www.wiki.ed.ac.uk/display/ResearchServices/Memory+Specification) and adjust the `--clusterOptions` flag appropriately, e.g.

```bash
--clusterOptions "-C mem256GB" --max_memory "256GB"
```
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00			`# nf-core/configs: Eddie Configuration`

Updated resources folder & institute name Local resources folder was re-organized & re-named BioinformaticsResources. Added note about viralrecon & atacseq pipeline-specific config. IGMM name-changed recently; updated to IGC. 2021-07-21 11:15:17 +00:00			`nf-core pipelines sarek, rnaseq, atacseq, and viralrecon have all been tested on the University of Edinburgh Eddie HPC. All except atacseq have pipeline-specific config files; atacseq does not yet support this.`
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00
			`## Getting help`

Updated resources folder & institute name Local resources folder was re-organized & re-named BioinformaticsResources. Added note about viralrecon & atacseq pipeline-specific config. IGMM name-changed recently; updated to IGC. 2021-07-21 11:15:17 +00:00			`There is a Slack channel dedicated to eddie users on the MRC IGC Slack: [https://igmm.slack.com/channels/eddie3](https://igmm.slack.com/channels/eddie3)`
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00
			`## Using the Eddie config profile`

			To use, run the pipeline with `-profile eddie` (one hyphen).
			This will download and launch the [`eddie.config`](../conf/eddie.config) which has been pre-configured with a setup suitable for the [University of Edinburgh Eddie HPC](https://www.ed.ac.uk/information-services/research-support/research-computing/ecdf/high-performance-computing).

Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00			`The configuration file supports running nf-core pipelines with Docker containers running under Singularity by default. Conda is not currently supported.`
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00
			```bash
Added pipeline-specific config files 2021-03-18 12:55:03 +00:00			`nextflow run nf-core/PIPELINE -profile eddie # ...rest of pipeline flags`
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00			```

Removed sneaky trailing whitespace. 2021-03-24 13:58:10 +00:00			`Before running the pipeline you will need to install Nextflow or load it from the module system. Generally the most recent version will be the one you want. If you want to run a Nextflow pipeline that is based on [DSL2](https://www.nextflow.io/docs/latest/dsl2.html), you will need a version that ends with '-edge'.`
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00
			`To list versions:`

			```bash
			`module avail igmm/apps/nextflow`
			```

			`To load the most recent version:`
Fixed code blocks for markdown linting 2021-03-24 13:55:31 +00:00
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00			```bash
			`module load igmm/apps/nextflow`
			```

Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00			`This config enables Nextflow to manage the pipeline jobs via the SGE job scheduler and using Singularity for software management.`
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00
Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00			`## Singularity set-up`
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00
Updated resources folder & institute name Local resources folder was re-organized & re-named BioinformaticsResources. Added note about viralrecon & atacseq pipeline-specific config. IGMM name-changed recently; updated to IGC. 2021-07-21 11:15:17 +00:00			Load Singularity from the module system and, if you have access to `/exports/igmm/eddie/BioinformaticsResources`, set the Singularity cache directory to the BioinformaticsResources path below. If some containers for your pipeline run are not present, please contact the [IGC Data Manager](data.manager@igc.ed.ac.uk) to have them added. You can add these lines to the file `$HOME/.bashrc`, or you can run these commands before you run an nf-core pipeline.
IGMM-specific docs 2021-03-24 11:01:41 +00:00
Updated resources folder & institute name Local resources folder was re-organized & re-named BioinformaticsResources. Added note about viralrecon & atacseq pipeline-specific config. IGMM name-changed recently; updated to IGC. 2021-07-21 11:15:17 +00:00			If you do not have access to `/exports/igmm/eddie/BioinformaticsResources`, set the Singularity cache directory to somewhere sensible that is not in your `$HOME` area (which has limited space). It will take time to download all the Singularity containers, but you can use this again.
Singularity info 2021-03-17 14:50:40 +00:00
Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00			```bash
Singularity info 2021-03-17 14:50:40 +00:00			`module load singularity`
Updated resources folder & institute name Local resources folder was re-organized & re-named BioinformaticsResources. Added note about viralrecon & atacseq pipeline-specific config. IGMM name-changed recently; updated to IGC. 2021-07-21 11:15:17 +00:00			`export NXF_SINGULARITY_CACHEDIR="/exports/igmm/eddie/BioinformaticsResources/nf-core/singularity-images"`
Singularity info 2021-03-17 14:50:40 +00:00			```

Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00			Singularity will create a directory `.singularity` in your `$HOME` directory on eddie. Space on `$HOME` is very limited, so it is a good idea to create a directory somewhere else with more room and link the locations.

			```bash
			`cd $HOME`
			`mkdir /exports/eddie/path/to/my/area/.singularity`
			`ln -s /exports/eddie/path/to/my/area/.singularity .singularity`
			```

			`## Running Nextflow`

			`### On a login node`

			`You can use a qlogin to run Nextflow, if you request more than the default 2GB of memory. Unfortunately you can't submit the initial Nextflow run process as a job as you can't qsub within a qsub.`

			```bash
			`qlogin -l h_vmem=8G`
			```

			If your eddie terminal disconnects your Nextflow job will stop. You can run Nextflow as a bash script on the command line using `nohup` to prevent this.

Fixed code blocks for markdown linting 2021-03-24 13:55:31 +00:00			```bash
Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00			`nohup ./nextflow_run.sh &`
			```

Updated resources folder & institute name Local resources folder was re-organized & re-named BioinformaticsResources. Added note about viralrecon & atacseq pipeline-specific config. IGMM name-changed recently; updated to IGC. 2021-07-21 11:15:17 +00:00			`### On a wild west node - IGC only`
Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00
			`Wild west nodes on eddie can be accessed via ssh (node2c15, node2c16, node3g22). To run Nextflow on one of these nodes, do it within a [screen session](https://linuxize.com/post/how-to-use-linux-screen/).`

			`Start a new screen session.`
Fixed code blocks for markdown linting 2021-03-24 13:55:31 +00:00
Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00			```bash
			`screen -S <session_name>`
			```

			`List existing screen sessions`
Fixed code blocks for markdown linting 2021-03-24 13:55:31 +00:00
Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00			```bash
			`screen -ls`
			```

			`Reconnect to an existing screen session`
Fixed code blocks for markdown linting 2021-03-24 13:55:31 +00:00
Details on running on eddie Removed Conda support, Singularity by default. Added login and screen session on wild west node options. 2021-03-18 12:48:48 +00:00			```bash
			`screen -r <session_name>`
			```

Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00			`## Using iGenomes references`

Updated resources folder & institute name Local resources folder was re-organized & re-named BioinformaticsResources. Added note about viralrecon & atacseq pipeline-specific config. IGMM name-changed recently; updated to IGC. 2021-07-21 11:15:17 +00:00			A local copy of the iGenomes resource has been made available on the Eddie HPC for those with access to `/exports/igmm/eddie/BioinformaticsResources` so you should be able to run the pipeline against any reference available in the `igenomes.config`.
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00			You can do this by simply using the `--genome <GENOME_ID>` parameter.

			`## Adjusting maximum resources`

Updated resources folder & institute name Local resources folder was re-organized & re-named BioinformaticsResources. Added note about viralrecon & atacseq pipeline-specific config. IGMM name-changed recently; updated to IGC. 2021-07-21 11:15:17 +00:00			This config is set for IGC standard nodes which have 32 cores and 384GB memory. If you are a non-IGC user, please see the [ECDF specification](https://www.wiki.ed.ac.uk/display/ResearchServices/Memory+Specification) and adjust the `--clusterOptions` flag appropriately, e.g.
Eddie profile documentation initial commit 2021-03-04 14:46:58 +00:00
			```bash
			`--clusterOptions "-C mem256GB" --max_memory "256GB"`
			```