4. Singularity¶

On NeSI’s Mahuika cluster you can invoke the Singularity module with the command: module load Singularity/.3.2.1. Note the “.” in front of the 3.2.0, which means it is a hidden module, as it is still in its test phase. You could add this line to the end of your .bashrc file to have it always loaded when a shell is opened.

In order to use Snakemake with Singularity, we need an image that contains the required software. We can either download ready-made containers or create or own. Lets look at both cases.

4.1. Using existing Docker/Singularity images¶

Singularity can convert any Docker image e.g. from the Docker Hub registry or use native Singularity images eg. found on Singularity Hub. This immediately opens up many many bioinformatics tools as a wide variety has alrwady been packaged in a container, e.g through the Biocontainers project.

Within Snakemake one can use these images on a per rule basis.

For example, we can use the pre-made bwa container in the indexing rule:

Listing 4.1 : Excerpt from the Snakefile, showing the bwa indexing with singularity setup.¶

...
rule makeidx:
     input:
         fasta = "data/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa"
     output:
         touch("data/makeidx.done")
     log:
         "analyses/logs/makeidx.log"
     benchmark:
         "analyses/benchmarks/makeidx.txt"
     singularity:
         "docker://biocontainers/bwa:0.17.5"
     shell:
         "bwa index {input.fasta} 2> {log}"
 ...

Here, we replaced the conda-based environment with a image directive. Snakemake will pull the image upon first encountering the image in a rule. The image will be stored locally within the .snakemake/singularity folder and reused when it is needed again.

Please see below on how to execute Snakemake top make use of Singularity.

That’s all great. However, sometimes we would like to bundle all the required software in one image and use this image for all rules of the workflow. This way we can control the versioning of the tools and the image.

4.2. Creating a Singularity image¶

You can create a Singularity image only on a machine where Singularity is installed and you have root privileges. This means that you will not be able to build any images on the NeSI cluster. You can install Singularity on your own machine, where you have root privileges.

Note

There is a workaround for this. You will use Singularity Hub to build and store your image in any case. You can do this without testing the image build locally. However, it is generally not advised and only do so on your won risk.

An example Singularity image configuration file is depicted below. It makes use of a conda base image pulled from Docker and installs some bioinformatics tools via the Bioconda channel into the base image.

Listing 4.2 : Singularity config-file for the biotools container used in the tutorial.¶

# Filename: Singularity
Bootstrap: docker
From: continuumio/miniconda3

%labels
   AUTHOR [email protected]

# This sets global environment variables for anything run within the container
%environment
  export PATH="/opt/conda/bin:/usr/local/bin:/usr/bin:/bin:"
  unset CONDA_DEFAULT_ENV
  export ANACONDA_HOME=/opt/conda

%post
   export PATH=/opt/conda/bin:$PATH
   conda config --add channels defaults
   conda config --add channels bioconda
   conda config --add channels conda-forge
   conda install --yes bwa=0.7.15 sickle-trim=1.33 subread=1.6.1 samtools=1.8
   conda clean --index-cache --tarballs --packages --yes

Once you are confident the image builds correctly and the tools you need are indeed working, you can create a GitHub repository and upload the container config-file like the one above. Then you can connect the repository to Singularity Hub to have the container build on the Singularity Hub systems, which is then ready for download, wherever you want to use it. The GitHub repository associated to the Singularity Hub image in this example is accessible at: https://github.com/sschmeier/biotools

Note

It should be noted that of course you can also create a Docker image instead of creating a Singularity image. The process is quite similar with a bit of different syntax.

4.3. Using one image for the whole workflow¶

We need to change the Snakefile to include a directive that Snakemake knows which Singularity image to use to run the workflow. Here, we specify the URL to the image location on Singularity Hub (see https://www.singularity-hub.org/collections/1107). In addition, we need to supply the correct parameter to the snakemake command upon execution (see below, line 1).

Note

Here we place the singularity directive on top of the workflow and not within a rule context. This way the image is used for all rules.

Listing 4.3 Snakefile changes to include the singularity directive.¶

singularity: "shub://sschmeier/biotools:latest"

SAMPLES, = glob_wildcards("fastq/{sample}.fastq.gz")

rule all:
    input:
        "analyses/results/counts.txt"

rule trimse:
    input:
        "fastq/{sample}.fastq.gz"
    output:
        "analyses/results/{sample}.trimmed.fastq.gz"
    log:
        "analyses/logs/{sample}.trimse"
    benchmark:
        "analyses/benchmarks/{sample}.trimse"
    conda:
        "envs/sickle.yaml"
    params:
        qualtype="sanger"
    shell:
        "sickle se -g -t {params.qualtype} -f {input} -o {output}"
        " 2> {log}"
...

4.4. Execute Snakemake in Singularity-mode¶

Execute Snakemake in Singularity-only mode:

$ snakemake --use-singularity
            --singularity-args "--bind path/to/include"
            --jobs 999
            --printshellcmds
            --rerun-incomplete
            --cluster-config data/nesi/cluster-nesi-mahuika.yaml
            --cluster "sbatch --account={cluster.account}
                              --partition={cluster.partition}
                              --mem={cluster.mem}
                              --ntasks={cluster.ntasks}
                              --cpus-per-task={cluster.cpus-per-task}
                              --time={cluster.time}
                              --hint={cluster.hint}
                              --output={cluster.output}
                              --error={cluster.error}"

With the parameter --use-singularity we instruct Snakemake to use Singularity. None of the conda directives in the rules will be used. The container image will be downloaded on first execution of Snakemake and will be store in the workflow .snakemake subdirectory for future runs of the workflow. The big advantage using Singularity is that all the tools and their versions as well as their dependencies are fixed upfront as is the operating system. In this way, we are not dependent on the cluster environment operating system, installed tools, etc. It is a great way to achieve better reproducibility of your work.

Attention

If you want to read or store data on a particular disk then the one you are executing Snakemake from, you need to supply --singularity-args "--bind path/to/include" so that Singularity can read and write from that location. I tend to include it anyways to be sure.