HPC High Performance Computing: 5.2. Running Matlab advanced jobs over SLURM

Running Matlab advanced jobs over SLURM

To achieve better results while running simulations on the cluster, some best practices should be followed. We'll show in this section some tips and techniques to speed up simulations and efficiently use the cluster.

 

Keep your data organized

As a general rule, plain directories (single folders containing hundreds or thousands of files) should be avoided, as it has performance impact on the BeeGFS filer. An strategy is to create a folder for each experiment you do and, inside this folder, place the directories you need. 

Paralellize when possible

Whenever possible, use SMP multi-processing, parallel loops or other techniques to achieve results faster

Running in serial:

Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results, and fit perfectly to illustrate the serial vs. parallel method:

1. Create the folder where the experiment data will be placed:

$ mkdir MonteCarlo
$ cd MonteCarlo/
$ mkdir data
$ mkdir out
$ mkdir script
$ mkdir job-out

2. Place the Matlab jobs in script:

$ cd script/
$ vi montecarlo.m

%% SERIAL DEMO
 
%% Init problem
iter = 100000;
sz = 55;
a = zeros(1, iter);
 
% MonteCarlo simulations
disp('Starting ...');
tic;
for simNum = 1:iter
a(simNum)= myFunction(sz);
end
toc;

We show also the code for myFunction:

function out = myFunction(in)
        out=max(svd(rand(in)));
end

3. Basic submission script, default resources requested defined to 1 node (1 CPU / 1G RAM):

#!/bin/bash
#SBATCH -J matlab_test
#SBATCH -p short
#SBATCH -N 1
#SBATCH --workdir=/homedtic/test/Matlab/ 
#SBATCH -o logs/slurm.%N.%J.%u.out # STDOUT
#SBATCH -e logs/slurm.%N.%J.%u.err # STDERR
 
module load MATLAB/2017a
matlab -nosplash -nojvm -nodesktop -r "run /homedtic/test/Matlab/montecarlo.m"

4. We have the elapsed time in the contents of the .out file in the job-out folder. Running this simulation in serial has taken nearly a minute:

$ tail -f slurm.node00x.xxxx.test.out 
                            < M A T L A B (R) >
                  Copyright 1984-2017 The MathWorks, Inc.
                   R2017a (9.2.0.556344) 64-bit (glnxa64)
                               March 27, 2017
 
 
For online documentation, see http://www.mathworks.com/support
For product information, visit www.mathworks.com.
 
Starting ...
Elapsed time is 23.494808 seconds.
>> 

Matlab Parallel Computing Toolbox product offers several features that simplify the development of parallel applications in MATLAB. It offers programming constructs such as parallel loops and distributed arrays that let you extend your serial programming into a parallel domain. You can use these constructs without the requirement of learning a complex parallel language or making significant changes to your existing code. The toolbox supports interactive development, which lets you connect to your cluster from a MATLAB session to interactively perform parallel computations or use them in batch. Currently, the Paralell Computing Toolbox license is limited to local cores per node. 

Let's run the same MonteCarlo simulation taking advantage of the paralellization. To do this, we must do two things:

a. Reserve a pool of N workers to register the job  (where N is the total of cores of the machine, excluding hyperthreading)

b. Tell Matlab to use the local pool of workers

1. Let's modify our Matlab script. We'll use the 'parfor' instead of 'for'. We can do it as long as the inner operations on the loop does not depend on the loop iterator, and we'll also use the directive parpool to create the pool using the profile 'local' with 6 slots. More information on Matlab cluster profiles here:

%% PARFOR DEMO
%% Init problem
% Init pe
parpool('local',6)

% MonteCarlo simulations
disp('Starting ...')
tic;
iter    = 100000;
sz      = 55;
a       = zeros(1, iter);
parfor (simNum = 1:iter, 6)
        a(simNum)=max(svd(rand(sz)))
end
toc;
delete(gcp)

2. Let's modify also our submission script. We'll include the parallel strategy to reserve as many slots as we have specified in the previous parfor command. We'll also modify the lines to call the job and the output and error files, for example: 6 cores on 2 processors in the same node.

#!/bin/bash
#SBATCH -J matlab_test
#SBATCH -p short
#SBATCH -N 1
#SBATCH --workdir=/homedtic/test/Matlab/
#SBATCH --sockets-per-node=2
#SBATCH --cores-per-socket=3
#SBATCH -o logs/slurm.%N.%J.%u.out # STDOUT
#SBATCH -e logs/slurm.%N.%J.%u.err # STDERR
 
module load MATLAB/2017a
matlab -nosplash -nodesktop -r "run /homedtic/test/Matlab/montecarlo.m"

3. Output

MATLAB is selecting SOFTWARE OPENGL rendering.
Opening log file:  /homedtic/test/java.log.1527

                            < M A T L A B (R) >
                  Copyright 1984-2017 The MathWorks, Inc.
                   R2017a (9.2.0.556344) 64-bit (glnxa64)
                               March 27, 2017

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
Starting parallel pool (parpool) using the 'local' profile ...
connected to 6 workers.

ans = 

 Pool with properties: 

            Connected: true
           NumWorkers: 6
              Cluster: local
        AttachedFiles: {}
          IdleTimeout: 30 minutes (30 minutes remaining)
          SpmdEnabled: true

Starting ...
Elapsed time is 8.252539 seconds.
Parallel pool using the 'local' profile is shutting down.

Results:

Serial: Elapsed time is 23.626662 seconds.

Paralel: Elapsed time is 8.252539 seconds.