Table of contents
We can achieve this task by using parallelization techniques provided in Julia
. For this blog, we will consider pmap
. In the following code, we create 10 large matrices and then perform a singular value decomposition on each. We will show that parallelizing this computation can attain significant speedup simply.
Julia Code. The code for the Julia
file is given below. Please save it in a text file and name it pmap_julia.jl
.
using ClusterManagers,Distributed, BenchmarkTools
# Add in the cores allocated by the scheduler as workers
addprocs(SlurmManager(parse(Int,ENV["SLURM_NTASKS"])-1))
print("Added workers: ")
println(nworkers())
using LinearAlgebra # Load the package centrally
@everywhere using LinearAlgebra # load it on each process
# DATA GENERATION: The following code will be run only on one core
# ----------------------------------------------------------------
# create the array of random initial points
X_array=[rand(100,100) for i in 1:10]
# benchmark serial implementation
b1 = @benchmark map(svd, X_array)
println("benchmark for serial code")
println("*************************")
io = IOBuffer()
show(io, "text/plain", b1)
s = String(take!(io))
println(s)
# benchmark parallel implementation
b2 = @benchmark pmap(svd, X_array)
println("benchmark for parallel code")
println("***************************")
io = IOBuffer()
show(io, "text/plain", b2)
s = String(take!(io))
println(s)
Note that in the first line, we have loaded the packages that we use. If they are not installed, you can install them by running the following commands from bash
.
srun --pty -p sched_mit_sloan_interactive julia
This will start a Julia REPL
in bash
. Run the following commands:
] add ClusterManagers,Distributed, BenchmarkTools
Now we are going to create a shell script that will be used to submit the job. The code for the shell script is below. Please save it in a text file, and name it run_pmap_julia.sh
. In the code, SBATCH -o pmap_julia.log-%j
indicates the name of the file where the output is written, and SBATCH -n 14
indicates the number of cores or cpus allocated to the job.
#!/bin/bash
# Slurm sbatch options: the partition should be set as sloan_batch
#SBATCH --partition=sched_mit_sloan_batch
#SBATCH -o pmap_julia_log-%j.txt
#SBATCH -n 14
# Initialize the module command first source
source /etc/profile
# Load Julia Module
module load julia/1.5.2
# Call your script as you would from the command line
# Call your script as you would from the command line
julia pmap_julia.jl
Now log in to MIT engaging, copy the files created above to your working directory, and run the following command. Ensure that the pwd
command results in the working directory, else use the command cd directory_that_contains_the_code
to change the working directory.
sbatch run_pmap_julia.sh
That's it! Once the computation is done, we see from the output log file that parallelization has decreased the computation time significantly:
benchmark for serial code
*************************
BenchmarkTools.Trial:
memory estimate: 4.71 MiB
allocs estimate: 102
--------------
minimum time: 52.735 ms (0.00% GC)
median time: 53.473 ms (0.00% GC)
mean time: 53.792 ms (0.15% GC)
maximum time: 56.662 ms (4.62% GC)
--------------
samples: 93
evals/sample: 1
benchmark for parallel code
***************************
BenchmarkTools.Trial:
memory estimate: 1.60 MiB
allocs estimate: 1436
--------------
minimum time: 6.624 ms (0.00% GC)
median time: 6.927 ms (0.00% GC)
mean time: 7.172 ms (0.78% GC)
maximum time: 12.139 ms (0.00% GC)
--------------
samples: 697
evals/sample: 1
where we see that the parallel code is more than 7 times faster than that of the serial code!
The sbatch
commands are the same as MIT Supercloud
, so we can use the same command as mentioned in the blogs about the MIT Sueprcloud
.
To protect the data:
eo-fix-storage
eo-fix-permissions all
To see the storage:
eo-show-quota
Note that, each user has access to 4 different storage areas:
/home/myusername
- working space for source code, scripts,hand-edited files etc. Each user has 100GB by default.
/pool001/myusername
- Extra storage space. Each user has 1 TB by default
/nobackup1/myusername
- Very fast lustre parallel file system for parallel I/O, this should not be used for long-term storage.
/nfs/sloanlab001/projects
- Requestable shared project space that can be linked to your home folder, eg HOME/projects/myproject_proj
To view the running jobs we can type the command:
eo-show-myjobs
Suppose we want to cancel job number 12345. The command is:
scancel 12345
If we want to get a quick view of all the jobs completed within the last 5 days, we use:
eo-show-history
A comprehensive documentation about Engaging
is available at the link:
https://wikis.mit.edu/confluence/display/sloanrc/Engaging+Platform
which requires an MIT login.