Blog as GitHub Gist
I very much dislike the interface of Gist, but at the same time, I often find myself searching the web / prompting LLMs for the same coding snippets.
Now I think storing my code snippets and comments here is much cleaner and searchable. I can also customize my organizationhu on whether they should all go into one page or separated into different pages.
Slurm
Parallelizing Jobs
Here I am showing a simple Slurm script that launches 8 GPUs and have each GPU processing a subset of the dataset (based on rank
and worldsize
).
#!/bin/bash
# Request half an hour of runtime:
#SBATCH --time=1-23:59:00
# Ask for the GPU partition and 1 GPU
#SBATCH --partition=...
#SBATCH --nodes=1 --gpus-per-task=1 --cpus-per-task=10
#SBATCH --ntasks=8
#SBATCH --array=0-7
# Use more memory (10GB) (CPU RAM):
#SBATCH --mem-per-cpu=10g
# Specify a job name:
#SBATCH -J exp-parallel
# Specify an output file
#SBATCH -o /log/parallel/parallel-%A_%a.out
#SBATCH -e /log/parallel/parallel-%A_%a.err
nvidia-smi
eval "$(conda shell.bash hook)"
conda activate env_XXX
srun --gres=gpu:1 --ntasks=1 --exclusive python something.py --rank=$SLURM_ARRAY_TASK_ID --worldsize=$SLURM_NTASKS
The Python script with dataset partitioning using rank
and worldsize
.
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--rank', type=int, default=0)
parser.add_argument('--worldsize', type=int, default=1)
args = parser.parse_args()
data = [...]
_BLOCKSIZE = len(data) // args.worldsize
START=args.rank * _BLOCKSIZE
END=(args.rank+1) * _BLOCKSIZE if args.rank != args.worldsize - 1 else len(data)
data_subset = data[START:END]
# process(data_subset)
Tmux
no server running on /tmp/tmux-1713701081/default
Do pkill -USR1 tmux
(https://unix.stackexchange.com/questions/582423/recover-a-tmux-session-that-tmux-insists-isnt-running)