Blog as GitHub Gist

01 Jan, 0001

I very much dislike the interface of Gist, but at the same time, I often find myself searching the web / prompting LLMs for the same coding snippets.

Now I think storing my code snippets and comments here is much cleaner and searchable. I can also customize my organizationhu on whether they should all go into one page or separated into different pages.

Slurm

Parallelizing Jobs

Here I am showing a simple Slurm script that launches 8 GPUs and have each GPU processing a subset of the dataset (based on rank and worldsize).

#!/bin/bash

# Request half an hour of runtime:
#SBATCH --time=1-23:59:00

# Ask for the GPU partition and 1 GPU
#SBATCH --partition=...
#SBATCH --nodes=1 --gpus-per-task=1 --cpus-per-task=10
#SBATCH --ntasks=8
#SBATCH --array=0-7

# Use more memory (10GB) (CPU RAM):
#SBATCH --mem-per-cpu=10g

# Specify a job name:
#SBATCH -J exp-parallel

# Specify an output file
#SBATCH -o /log/parallel/parallel-%A_%a.out
#SBATCH -e /log/parallel/parallel-%A_%a.err

nvidia-smi
eval "$(conda shell.bash hook)"
conda activate env_XXX
srun --gres=gpu:1 --ntasks=1 --exclusive python something.py --rank=$SLURM_ARRAY_TASK_ID --worldsize=$SLURM_NTASKS

The Python script with dataset partitioning using rank and worldsize.

import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--rank', type=int, default=0)
parser.add_argument('--worldsize', type=int, default=1)
args = parser.parse_args()

data = [...]

_BLOCKSIZE = len(data) // args.worldsize
START=args.rank * _BLOCKSIZE
END=(args.rank+1) * _BLOCKSIZE if args.rank != args.worldsize - 1 else len(data)
data_subset = data[START:END]

# process(data_subset)

Tmux

no server running on /tmp/tmux-1713701081/default

Do pkill -USR1 tmux (https://unix.stackexchange.com/questions/582423/recover-a-tmux-session-that-tmux-insists-isnt-running)

#Tech-Tips