This function generates and evaluates admixture graphs in numgen iterations across numrep independent repeats to find well fitting admixturegraphs. It uses the function future_map to parallelize across the independent repeats. The function plan can be called to specify the details of the parallelization. This can be used to parallelize across cores or across nodes on a compute cluster. Setting numadmix to 0 will search for well fitting trees, which is much faster than searching for admixture graphs with many admixture nodes.

find_graphs_old(
  data,
  pops = NULL,
  outpop = NULL,
  numrep = 1,
  numgraphs = 50,
  numgen = 5,
  numsel = 5,
  numadmix = 0,
  numstart = 1,
  keep = c("all", "best", "last"),
  initgraphs = NULL,
  mutfuns = namedList(spr_leaves, spr_all, swap_leaves, move_admixedge_once,
    flipadmix_random, mutate_n),
  mutprobs = NULL,
  opt_worst_residual = FALSE,
  store_intermediate = NULL,
  parallel = TRUE,
  stop_after = NULL,
  verbose = TRUE,
  ...
)

Arguments

data

Input data in one of three forms:

  1. A 3d array of blocked f2 statistics, output of f2_from_precomp or f2_from_geno

  2. A directory which contains pre-computed f2-statistics

  3. The prefix of genotype files

pops

Populations for which to fit admixture graphs (default all)

outpop

An outgroup population which will split at the root from all other populations in all tested graphs. If one of the populations is know to be an outgroup, designating it as outpop will greatly reduce the search space compared to including it and not designating it as outpop.

numrep

Number of independent repetitions (each repetition can be run in parallel)

numgraphs

Number of graphs in each generation

numgen

Number of generations

numsel

Number of graphs which are selected in each generation. Should be less than numgraphs.

numadmix

Number of admixture events within each graph

numstart

Number of random initializations in each call to qpgraph. Defaults to 1, to speed up the graph optimization.

keep

Which models should be returned. One of all, best, last

  • all (default): Return all evaluated graphs

  • best: Return only the best fitting graph from each repeat and each generation

  • last: Return all graphs from the last generation

initgraphs

Optional graph or list of igraphs to start with. If NULL, optimization will start with random graphs.

mutfuns

Functions used to modify graphs. Defaults to the following:

  • spr_leaves: Subtree prune and regraft leaves. Cuts a leaf node and attaches it to a random other edge in the graph.

  • spr_all: Subtree prune and regraft. Cuts any edge and attaches the new orphan node to a random other edge in the graph, keeping the number of admixture nodes constant.

  • swap_leaves: Swaps two leaf nodes.

  • move_admixedge_once: Moves an admixture edge to a nearby location.

  • flipadmix_random: Flips the direction of an admixture edge (if possible).

  • mutate_n: Apply n of the mutation functions in this list to a graph (defaults to 2).

See examples for how to make new mutation functions.

mutprobs

Relative frequencies of each mutation function.

  • NULL (default) means each mutation function is picked with equal probability

  • A numeric vector of length equal to mutfuns defines the relative frequency of each mutation function

  • A matrix of dimensions numgen x length(mutfuns) defines the relative frequency of each mutation function in each generation

opt_worst_residual

Optimize for lowest worst residual instead of best score. FALSE by default, because the likelihood score is generally a better indicator of the quality of the model fit. Optimizing for the lowest worst residual is also slower (because f4-statistics need to be computed).

store_intermediate

Path and prefix of files for intermediate results to .rds. Can be useful if find_graphs_old doesn't finish sucessfully.

parallel

Parallelize over repeats (if numrep > 1) or graphs (if numrep == 1) by replacing map with future_map. Will only be effective if plan has been set.

stop_after

Stop optimization after stop_after seconds (and after finishing the current generation).

verbose

Print progress updates

...

Additional arguments passed to qpgraph

Value

A nested data frame with one model per line

See also

Examples

if (FALSE) {
find_graphs_old(example_f2_blocks, numrep = 200, numgraphs = 100,
            numgen = 20, numsel = 5, numadmix = 3)
}
if (FALSE) {
# Making new mutation functions by modifying or combining existing ones:
newfun1 = function(graph, ...) mutate_n(graph, 3, ...)
newfun2 = function(graph, ...) flipadmix_random(spr_leaves(graph, ...), ...)
find_graphs_old(f2_blocks, mutfuns = namedList(spr_leaves, newfun1, newfun2), mutprobs = c(0.2, 0.3, 0.5))
}