惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

T
The Blog of Author Tim Ferriss
S
Securelist
D
Docker
The Register - Security
The Register - Security
GbyAI
GbyAI
Recorded Future
Recorded Future
Engineering at Meta
Engineering at Meta
Stack Overflow Blog
Stack Overflow Blog
云风的 BLOG
云风的 BLOG
P
Proofpoint News Feed
罗磊的独立博客
博客园 - 【当耐特】
F
Full Disclosure
WordPress大学
WordPress大学
腾讯CDC
小众软件
小众软件
大猫的无限游戏
大猫的无限游戏
D
DataBreaches.Net
SecWiki News
SecWiki News
L
Lohrmann on Cybersecurity
I
InfoQ
MyScale Blog
MyScale Blog
量子位
Cyberwarzone
Cyberwarzone
博客园 - 三生石上(FineUI控件)
The Hacker News
The Hacker News
F
Fortinet All Blogs
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Jina AI
Jina AI
博客园_首页
H
Help Net Security
K
Kaspersky official blog
酷 壳 – CoolShell
酷 壳 – CoolShell
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Webroot Blog
Webroot Blog
Blog — PlanetScale
Blog — PlanetScale
V
Vulnerabilities – Threatpost
Y
Y Combinator Blog
The Cloudflare Blog
P
Proofpoint News Feed
V
Visual Studio Blog
C
Cyber Attacks, Cyber Crime and Cyber Security
T
Tailwind CSS Blog
爱范儿
爱范儿
P
Privacy International News Feed
Security Archives - TechRepublic
Security Archives - TechRepublic
The GitHub Blog
The GitHub Blog
C
Cybersecurity and Infrastructure Security Agency CISA
B
Blog RSS Feed

Jiayi Liu on Jiayi Liu

Differential variability analysis of single-cell gene expression data | Jiayi Liu Phylogenetic inference from single-cell RNA-seq data | Jiayi Liu 2021这一年 | Jiayi Liu A 'one-two punch' therapy strategy to target chemoresistance in estrogen receptor positive breast cancer | Jiayi Liu Activation of EPHA2-ROBO1 heterodimer by SLIT2 attenuates non-canonical signaling and proliferation in squamous cell carcinomas | Jiayi Liu Deploying Huginn with Docker on Google Cloud Platform | Jiayi Liu 强迫症二三事 | Jiayi Liu 来Houston一周年 | Jiayi Liu 与母亲对话若干则 | Jiayi Liu 烧饭经验 | Jiayi Liu 乘车记 by 我妈 | Jiayi Liu 2019 | Jiayi Liu Patterned progression of gut microbiota associated with necrotizing enterocolitis and late onset sepsis in preterm infants: a prospective study in a Chinese neonatal intensive care unit | Jiayi Liu 【转】食人族留学生纪事本末 | Jiayi Liu
Running singularity containers in a High Performance Computing (HPC) environment | Jiayi Liu
Jiayi Liu · 2023-07-18 · via Jiayi Liu on Jiayi Liu

Preface

Working in an HPC environment, the “dependency hell” is the most bothersome nuisance. (Yep, I personally regard it “the most”, not “one of the most”.)

For instance, in R, users often receive messages such as:

"Error: package or namespace load failed for XXX:package YYY was installed before R 4.0.0: please re-install it"

The solution could be relatively easy, you may upgrade to the latest version of R or trying re-installing the YYY package.

In Linux, however, things could be much more painful.

linuxerror

Especially when working in an HPC, it is nearly impossible to install/update pacakges that requires sudo privilege.

Luckily, my ex-boss, Dr. Chang, introduced me the concept of app containerization using tools such as Docker and Singularity. I tried, and realized that I cannot live without them. They are truly my life savers.

In this post, I would like to provide a step by step tutorial on how to run singularity containers in an HPC environment.

Table of Contents

  1. What is a container?
  2. Where can I get containers?
    2.1 Public resources
    2.2 Build my own container
  3. How to run a container on HPC

What is a container?

To begin with, a disclamer: I’m not a CS person. That means I won’t be using terminologies. Instead I’ll use just plain words.

Think of this scenario: you got a message when running a software X on your HPC:

The software X is update, however, the software Y is not. Since software X depends on some functions wrapped in Y, so you are not able to run X now.

– bang, ERROR.

You might think, ok, let me update Y. You tried, and got another message:

You are not allowed to update Y since you are not the super user of this computer.

– bang, ERROR again.

This is the simplest example of dependencies hell. In reality, it could be much more complex since the software dependency manner is not one-to-one; sometimes you may need to install more softwares that requires “sudo” privilege.

One way of solving this is to build your own Utopia, where you take fully control:

  1. You can keep your softwares at your favourite version, and they don’t auto-update themselves.
  2. Your softwares has the appropriate dependency softwares, and they always work well with each other.
  3. Your are the super user of this Utopia, so you can install anything that requires sudo privilege.
  4. You can still read and write your data (files, images, etc.) to the outside world.

Another advantage of this Utopia is that you can reproduce your analyses – the package version are controlled, so just run your scripts and it’ll always give you the same results – nothing’s gonna change.

This kind of Utopia world is the so called “container”.

As a bioinfo hobbyist and a singularity lover, I’ll be using the bioinformatics related tools to illustrate how to build and use singularity containers. As for the other containerization tools, you can simply check their documentations.

Where can I get containers?

There are two ways of acquiring a container, downloading one from the public, or build one from scratch:

Public resources

Public resources are the easiest way to get a container. Repositories such as Docker Hub and Singularity Hub are the most popular ones. You can search for pre-built containers, such as conda, jupyter notebook, Rstudio, etc. Thesea containers are open source, so you can download them and use them for free.

Build my own container

If you cannot find a container that suits your need, you can build one from scratch. This is a bit more complicated, but it’s not that hard. In this post, I’ll be using the Singularity to illustrate how to build a container.

Basically, what you need is simply a “recipe”. A recipe is a file that contains instructions on how to build a container. Here’s an example of a recipe:

Bootstrap: docker
From: debian:stable-slim

%labels
    Version v0.0.1

%help
    Please contact Jiayi for help. 

%post
## basic stuffs
    apt-get update
    apt-get install -y --no-install-recommends \
    bash-completion \
    ca-certificates \
    locales
    echo 'en_US.UTF-8 UTF-8' >> /etc/locale.gen && \
    /usr/sbin/locale-gen en_US.UTF-8 && \
    /usr/sbin/update-locale LANG=en_US.UTF-8

## misc stuffs
    apt-get update && apt-get install -y --no-install-recommends \
    file \
    less \
    openssl \
    libssl-dev \
    curl \
    coreutils \
    gdebi-core

## devel tools
    apt-get update && apt-get install -y --no-install-recommends \
    g++ \
    gfortran \
    make \
    cmake 

## dependencies
    apt-get update && apt-get install -y --no-install-recommends \
    gettext \
    pkg-config \
    autoconf-archive \
    autoconf \
    automake \
    autopoint \
    txt2man \
    build-essential \
    autoconf-archive \
    automake \
    autopoint \
    pkg-config \
    txt2man

## axel
    apt-get -y install aria2

After you’ve done with the recipe, you can paste it into the Singularity Container Builder to generate your own container.

How to run a container on HPC?

In the following code blocks in this section, I’ll add my comments/recommendations starting with ## Jiayi:.

Here, I’ll be using one of my Singularity containers as an example. I’ve built several singularity containers that could boost productivity in bioinformatics research. You can check them from here.

First, in your HPC, you’ll need to have the Singularity installed and loaded. If you don’t have it, you can follow the instructions here. If you have it, you can simply load it:

module load singularity/3.8.3 
## Jiayi: check with your HPC admin for the version, or check with `module avail singularity`. 

Second, you’ll need to pull the container from the cloud. This above command will pull the container from the cloud and save it as a file named “rserver_4-1-1bioinfo.sif”.

singularity pull --arch amd64 library://jiayiliujiayi/rserver/rserver:4-1-1bioinfo
## Jiayi: This container is a Rstudio server container, along with common R packages including Seurat, ggplot2, tidyverse, dplyr, etc.  

Third, you can use this container in your HPC. I’ll use SLURM environment as an example. In your workign directory, you can create a script named “run_rstudio.sh” with the following content:

#!/bin/bash

## Jiayi: This is a SLURM script.  You can check with your HPC admin for the details (marked with XXX).  You can also check the documentations of SLURM: https://slurm.schedmd.com/

#SBATCH --job-name=rstudio
#SBATCH --output=rstudio_%j.out
#SBATCH --error=rstudio_%j.err
#SBATCH --time=3-00:00:00 ## Jiayi: check with your HPC admin for the time limit, or change it to the time length you need.
#SBATCH --mem=100G
#SBATCH --cpus-per-task=10
#SBATCH --partition=general
#SBATCH --qos=general
#SBATCH --mail-type=ALL
#SBATCH --mail-user=XXX ## Jiayi: change it to your email address if you want to get updated with the job status. 

# load the singularity module
module purge
module load singularity/3.8.3

# prepare for the Rstudio server
## Do not suspend idle sessions.
export SINGULARITYENV_RSTUDIO_SESSION_TIMEOUT=0

## Container user set ups
export SINGULARITYENV_USER=XXX ## Jiayi: change it to your username on the HPC
export SINGULARITYENV_PASSWORD=XXXX ## Jiayi: define your own password, and it doesn't necessarily need to be the same as your HPC password.

## Get IP address
IP_ADDR=`hostname -i`
echo $IP_ADDR

## write out the IP address and port number to a file
cat 1>&2 <<END
1. SSH tunnel from your workstation using the following command:

   ssh -N -L 8787:${IP_ADDR}:8787 ${SINGULARITYENV_USER}@XXX ## Jiayi: change it to your HPC IP address. 

   and point your web browser to http://localhost:8787

2. log in to RStudio Server using the following credentials:

   user: ${SINGULARITYENV_USER}
   password: ${SINGULARITYENV_PASSWORD}

When done using RStudio Server, terminate the job by:

1. Exit the RStudio Session ("power" button in the top right corner of the RStudio window)
2. Issue the following command on the login node:

      scancel -f ${SLURM_JOB_ID}

END

# start the Rstudio server
singularity exec --cleanenv \
    --bind run:/run,var-lib-rstudio-server:/var/lib/rstudio-server,database.conf:/etc/rstudio/database.conf, $PWD:/XXX/,/tmp \ ## Jiayi: change it to your working directory, and the directory where you want to save the Rstudio session. 
    ~/rserver_4.1.1bioinfo.sif \
    --server-user ${USER} \
    --auth-none=0 \
    --auth-pam-helper-path=pam-helper

printf 'rserver exited' 1>&2

Fourth, you can submit the job to the HPC using the following command:

sbatch run_rstudio.sh

In your working directory, you’ll see two files: “rstudio_XXXX.out” and “rstudio_XXXX.err”. The former one is the output of the job, and the latter one is the error log. You can check the error log if you encounter any problems. If you don’t see any error, you can check the output file. In the output file, you’ll see a line as such:

ssh -N -L 8787:${IP_ADDR}:8787 ${SINGULARITYENV_USER}@XXX

Copy the above command and paste it into your local terminal.

Finally, you can open your web browser and type in “localhost:8787” in the address bar. You’ll see the Rstudio server page. You can log in with your username and password. And Voila. You are in the Rstudio server environment of your Utopia.

Also, you may wanna take a look at the other information in the output file regarding how to terminate the SLURM job.