1
0
Fork 0
mirror of https://github.com/MillironX/nf-configs.git synced 2024-11-14 13:43:09 +00:00
nf-configs/docs/sage.md
2022-08-31 09:10:30 -07:00

2 KiB

nf-core/configs: Sage Bionetworks Global Configuration

To use this custom configuration, run the pipeline with -profile sage. This will download and load the sage.config, which contains a number of optimizations relevant to Sage employees running workflows on AWS (e.g. using Nextflow Tower). This profile will also load any applicable pipeline-specific configuration.

This global configuration includes the following tweaks:

  • Update the default value for igenomes_base to s3://sage-igenomes
  • Enable retries by default when exit codes relate to insufficient memory
  • Allow pending jobs to finish if the number of retries are exhausted
  • Increase the amount of time allowed for file transfers
  • Increase the default chunk size for multipart uploads to S3
  • Slow down job submission rate to avoid overwhelming any APIs
  • Define the check_max() function, which is missing in Sarek v2
  • Slow the increase in the number of allocated CPU cores on retries
  • Increase the default time limits because we run pipelines on AWS

Additional information about iGenomes

The following iGenomes prefixes have been copied from s3://ngi-igenomes/ (eu-west-1) to s3://sage-igenomes (us-east-1). See this script for more information. The sage-igenomes S3 bucket has been configured to openly available, but files cannot be downloaded out of us-east-1 to avoid egress charges. You can check the conf/igenomes.config file in each nf-core pipeline to figure out the mapping between genome IDs (i.e. for --genome) and iGenomes prefixes (example).

  • Human Genome Builds
    • Homo_sapiens/Ensembl/GRCh37
    • Homo_sapiens/GATK/GRCh37
    • Homo_sapiens/UCSC/hg19
    • Homo_sapiens/GATK/GRCh38
    • Homo_sapiens/NCBI/GRCh38
    • Homo_sapiens/UCSC/hg38
  • Mouse Genome Builds
    • Mus_musculus/Ensembl/GRCm38
    • Mus_musculus/UCSC/mm10