HPC High Performance Computing: 5.6. Submitting cuda jobs

Submitting cuda Jobs

CUDA (Compute Unified Device Architecture) was developed by NVIDIA a general purpose parallel computing architecture. It consists of CUDA Instruction Set Architecture (ISA) and parallel compute engine in the NVIDIA GPU (Graphics Processing Unit). The GPU has hundreds of cores that can collectively run thousands of computing threads. This capability complements the ability of a conventional CPU to run serial tasks by permitting the CPU to run the serial portions of an application, to handoff to the GPU parallel subtasks and to manage the complete set of tasks that make up the overall algorithm. Generally, in this model of computing, the best results are obtained my minimizing the communication between CPU (host) and the GPU (device).

In this section, we have submitted a basic job using the “gres” parameter, wich tells slurm that we want to reserve a gpu resource.

First of all, we have created the file  that uses a gres parameter to reserve a gpu resource from the cluster. 

#SBATCH -J prova_dani_uname10
#SBATCH -p short
#SBATCH --chdir=/homedtic/test/gpu_maxwell
#SBATCH --gres=gpu:1
#SBATCH --time=2:00
#SBATCH -o slurm.%N.%J.%u.out # STDOUT
#SBATCH -e slurm.%N.%J.%u.err # STDERR
module load CUDA/11.4.3

If we execute a “scontrol show” of our job we can see in wich node it’s running and if is using a gpu resource: 

test@login01:/homedtic/test/gpu_maxwell# scontrol show job 945
JobId=945 JobName=prova_dani_uname10
   UserId=root(0) GroupId=root(0) MCS_label=N/A
   Priority=4670 Nice=0 Account=root QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:07 TimeLimit=00:02:00 TimeMin=N/A
   SubmitTime=2017-12-11T21:04:37 EligibleTime=2017-12-11T21:04:37
   StartTime=2017-12-11T21:04:38 EndTime=2017-12-11T21:06:38 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=short AllocNode:Sid=node020:10105
   ReqNodeList=(null) ExcNodeList=(null)
   NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:2:2
   Socks/Node=1 NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=1024M MinTmpDiskNode=0
   Features=(null) Gres=gpu:1 Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)

If we go to the node020 and we execute an nvidia-smi to check if the gpu is running a process: 

test@node020:/homedtic/test/gpu_maxwell# nvidia-smi
Mon Dec 11 21:07:23 2017       
| NVIDIA-SMI 384.90                 Driver Version: 384.90                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX TIT...  On   | 00000000:41:00.0 Off |                  N/A |
| 22%   47C    P2   219W / 250W |  11001MiB / 12207MiB |     99%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0     11078      C   ./gpu_burn                                 10988MiB |

In the previous example we have executed a gpu test. Remember to make the module load cuda/8.0.61 to use the cuda software. 

test@login01:/homedtic/test/gpu_maxwell# cat slurm.node020.947.root.out
GPU 0: GeForce GTX TITAN X (UUID: GPU-61bef67d-703c-f4da-60ad-04430a92f69e)
Run length not specified in the command line.  Burning for 10 secs
20.0%  proc'd: 2711   errors: 0   temps: -- 
30.0%  proc'd: 5422   errors: 0   temps: -- 
50.0%  proc'd: 8133   errors: 0   temps: -- 
60.0%  proc'd: 10844   errors: 0   temps: -- 
90.0%  proc'd: 16266   errors: 0   temps: -- 
100.0%  proc'd: 21688   errors: 0   temps: -- 
Killing processes.. done
Tested 1 GPUs:

We can be more specific when requesting a gpu resource because we can indicate the gpu type: 

#SBATCH -J prova_dani_uname10
#SBATCH -p short
#SBATCH --chdir=/homedtic/test/gpu_maxwell
#SBATCH --gres=gpu:maxwell:1
#SBATCH --time=2:00
#SBATCH --sockets-per-node=1
#SBATCH --cores-per-socket=2
#SBATCH --threads-per-core=2
#SBATCH -o slurm.%N.%J.%u.out # STDOUT
#SBATCH -e slurm.%N.%J.%u.err # STDERR
module load CUDA/11.4.3