Some (mostly superseded) functions in ADMIXTOOLS 2 can parallelize
computations across multiple cores or compute nodes by making use of the
packages future
and furrr
. In
find_graphs_old()
, this was used to rerun the topology
optimization multiple times (possibly with different starting graphs).
The more recent find_graphs()
function doesn’t support
this, first because it’s faster than find_graphs_old()
, and
second because packages like furrr
and foreach
make it easy to manually run find_graphs()
multiple times
in parallel. The examples below should therefore be considered outdated.
The page is still up in case the instructions are useful in another
context.
To parallelize computations across cores, run
future::plan('multiprocess')
To turn parallelization off, run
future::plan('sequential')
Sometimes it makes more sense to parallelize across compute nodes
rather than across cores. This can be done either in the traditional way
of writing an R script and submitting it many times in parallel as
separate jobs, or interactively from within R again using the
furrr
/future
framework. However, it is more
complicated to set up than parallelization across cores.
On a cluster using the Slurm job scheduler, the following command will set up parallelization across compute nodes.
future::plan(tweak(batchtools_slurm, workers = 50,
resources=list(ncpus = 1, memory = 1024,
walltime = 10*60*60, partition = 'short')))
It specifies that up to 50 jobs should be run at a time, with each
one requesting one CPU, 1024 MB of memory, and 10 hours on the partition
called short
.
This requires the R package future.batchtools
and a
batchtools template file in the working directory, such as this
one.
With this setup, the find_graphs
function will submit
each of the 200 repeats as a separate job.
As it will still take a while for this to finish, it is a good idea to submit this as one job which calls an R script. That R script will in turn spawn 200 new jobs and wait for them to finish and return their results.
The R script could look like this.
library(admixtools)
library(future.batchtools)
future::plan(tweak(batchtools_slurm, workers=50,
resources=list(ncpus = 1, memory=1024,
walltime=10*60*60, partition='short')))
pops = c('popA', 'popB', 'popC', 'popD')
opt_results = find_graphs_old('/my/f2/dir/', pops, outpop = pops[1], numrep = 200,
numgen = 20, numgraphs = 100, numadmix = 3, verbose = FALSE)
saveRDS(opt_results, file='opt_results.rds')
It could be in a file called opt_graphs.Rscript
and be
run like this, or submitted as a job.
It takes more time to evaluate larger graphs, and in particular graphs with more admixture nodes. It’s probably a good idea to start with a small number of repeats, generations, and graphs per generation to get a sense of the runtime before scaling it up.