Guies BibTIC: HPC High Performance Computing: 5.2. Submitting basic jobs

Tools

Submitting basic jobs

The basic command to send a job is sbatch.

test@login01:~$ sbatch sleep.sh
Submitted batch job 175

Once registered, the scheduler tells the job ID. Keep it handy, in case of problems you'll need this value to debug what happens.

Although you can specify the options when calling sbatch, when submitting a job execution request it is strongly advised that all options be provided within a job definition file (in these examples the file will be called “job.sh"). This file will contain the command you wish to execute and SLURM resource request options that you need.

#!/bin/bash
#SBATCH -J prova_uname10
#SBATCH -p short
#SBATCH -N 1
#SBATCH -n 2 
#SBATCH --chdir=/homedtic/test/slurm_jobs
#SBATCH --time=2:00
#SBATCH -o %N.%J.out # STDOUT
#SBATCH -e %N.%j.err # STDERR
 
ps -ef | grep slurm
uname -a >> /homedtic/test/uname.txt

The "-J" option sets the name of the job. This name is used to create the output log files for the job. We recommend using a capital letter for the job name is order to distinguish these log files from the other files in your working directory. This makes it easier to delete the log files later.

The "-p" option requests the queue in which the job should run.

The "-N" option Request that a minimum of nodes be allocated to this job.

The "-n" option Request the number of tasks per node.

The “–time” option specify a a limit on the total run time of the job allocation.

The “–chdir” option Set the working directory of the batch script to directory before it is executed.

The "-o" option instruct Slurm to connect the batch script's standard output directly to the file name specified in the "filename pattern".

The "-e" option instruct Slurm to connect the batch script's standard error directly to the file name specified in the "filename pattern".

We can monitor how our job is doing with the scontrol show job command:

test@login01:~$ scontrol show job 173

   JobId=173 JobName=prova_uname10
   UserId=test(1039) GroupId=info_users(10376) MCS_label=N/A
   Priority=6501 Nice=0 Account=info QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=08:00:00 TimeMin=N/A
   SubmitTime=2017-11-27T16:37:47 EligibleTime=2017-11-27T16:37:47
   StartTime=2017-11-27T16:50:21 EndTime=Unknown Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=medium AllocNode:Sid=node009:5743
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null) SchedNodeList=node[004,020]
   NumNodes=2-2 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=4,mem=4096,node=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=1024M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/homedtic/test/sleep.sh
   WorkDir=/homedtic/test
   StdErr=/homedtic/test/slurm.%N.173.err
   StdIn=/dev/null
   StdOut=/homedtic/test/slurm.%N.173.out
   Power=

The scontrol show job output shows the task ID of our job (jobId), priority (priority) who has launched it (userId), which is the state of the job (JobState) , when and where it has been registered (submitTime and partition), and how many slots it's been using (slots).

Job run time

Each queue has specific policies that enforce how long jobs are allowed to execute: short* queues allow up to 2 hours, medium* queue allow up to 8 hours, and high* queues have no runtime limit. When you submit a job, you are either implicitly or explicitly indicating how long the job is expected to run. The first way to indicate the maximum runtime using the format

#!/bin/bash
#SBATCH -J prova_uname10
#SBATCH -p short
#SBATCH --time=2:00
 
ps -ef | grep slurm
srun -n8 uname -a >> $HOME/uname.txt

Here we have an example of time limit. This script will stop when it reaches 2 minutes of execution. If the requested time limit exceeds the partition's time limit, the job will be left in a PENDING state (possibly indefinitely).

Redirection output and error files

By default, if not specified otherwise, the scheduler will redirect the output of anuy job you launch to a couple of files, placed on your actual directory called <slurm-jobid.out> After a few executions and test, probably your $HOME will look like this:

As a general rule, you are advised to use the following flags to redirect the input and error files:

Flag	Request	Comment
-e <path>/<filename>	Redirect error file	The system will create the given file on the path specified and will redirect the job's error file here. If name is not specified, default name will apply.
-o <path>/<filename>	Redirect output file	The system will create the given file on the path specified and will redirect the job's output file here. If name is not specified, default name will apply.
–workdir	Change error and output to the working directory	The output file and the error file will be placed in the directory from which 'sbatch' is called.

Of course, we can place this option on our job definition file:

#!/bin/bash
#SBATCH -J prova_dani_uname10
#SBATCH -p short
#SBATCH -N 1
#SBATCH -n 2 # number of cores
#SBATCH --chdir=/homedtic/test/slurm_jobs
#SBATCH --time=2:00
#SBATCH -o slurm.%N.%J.%u.out # STDOUT
#SBATCH -e slurm.%N.%J.%u.err # STDERR
 
ps -ef | grep slurm

And when launching the job, we'll see the output files created at /homedtic/test/slurm_jobs. If no error is reported, an empty file will be created.

test@node009:~/slurm_jobs$ ls
slurm.node005.206.test.err  slurm.node005.206.test.out
test@node009:~/slurm_jobs$ 
test@node009:~/slurm_jobs$ 
test@node009:~/slurm_jobs$ cat slurm.node005.206.test.out 
root      1138     1  0 Nov24 ?        00:00:00 /usr/sbin/slurmd
root      6714     1  0 20:04 ?        00:00:00 slurmstepd: [206]   
test   6719  6714  0 20:04 ?        00:00:00 /bin/bash /var/spool/slurmd/job00206/slurm_script
test   6724  6719  0 20:04 ?        00:00:00 grep slurm
test@node009:~/slurm_jobs$

Deleting and modifying jobs

We can modify the requeriments of a job while it's waiting to be processed. Once it's

Flag	Request	Comment
scancel <job_id>	Delete the job	The system will remove the job and all its dependencies from the queues and the execution hosts

Below I show a table of job reasons

JOB REASON CODES
AssociationJobLimit	The job's association has reached its maximum job count.
AssociationResourceLimit	The job's association has reached some resource limit.
AssociationTimeLimit	The job's association has reached its time limit.
BadConstraints	The job's constraints can not be satisfied.
BeginTime	The job's earliest start time has not yet been reached.
BlockFreeAction	An IBM BlueGene block is being freed and can not allow more jobs to start.
BlockMaxError	An IBM BlueGene block has too many cnodes in error state to allow more jobs to start.
Cleaning	The job is being requeued and still cleaning up from its previous execution.
Dependency	This job is waiting for a dependent job to complete.
FrontEndDown	No front end node is available to execute this job.
InactiveLimit	The job reached the system InactiveLimit.
InvalidAccount	The job's account is invalid.
InvalidQOS	The job's QOS is invalid.
JobHeldAdmin	The job is held by a system administrator.
JobHeldUser	The job is held by the user.
JobLaunchFailure	The job could not be launched. This may be due to a file system problem, invalid program name, etc.
Licenses	The job is waiting for a license.
NodeDown	A node required by the job is down.
NonZeroExitCode	The job terminated with a non-zero exit code.
PartitionDown	The partition required by this job is in a DOWN state.
PartitionInactive	The partition required by this job is in an Inactive state and not able to start jobs.
PartitionNodeLimit	The number of nodes required by this job is outside of it's partitions current limits. Can also indicate that required nodes are DOWN or DRAINED.
PartitionTimeLimit	The job's time limit exceeds it's partition's current time limit.
Priority	One or more higher priority jobs exist for this partition or advanced reservation.
Prolog	It's PrologSlurmctld program is still running.
QOSJobLimit	The job's QOS has reached its maximum job count.
QOSResourceLimit	The job's QOS has reached some resource limit.
QOSTimeLimit	The job's QOS has reached its time limit.
ReqNodeNotAvail	Some node specifically required by the job is not currently available. The node may currently be in use, reserved for another job, in an advanced reservation, DOWN, DRAINED, or not responding. Nodes which are DOWN, DRAINED, or not responding will be identified as part of the job's "reason" field as "UnavailableNodes". Such nodes will typically require the intervention of a system administrator to make available.
Reservation	The job is waiting its advanced reservation to become available.
Resources	The job is waiting for resources to become available.
SystemFailure	Failure of the Slurm system, a file system, the network, etc.
TimeLimit	The job exhausted its time limit.
QOSUsageThreshold	Required QOS threshold has been breached.
WaitingForScheduling	No reason has been set for this job yet. Waiting for the scheduler to determine the appropriate reason.