This function generates and evaluates admixture graphs in numgen iterations to find well fitting admixturegraphs.

find_graphs(
  data,
  numadmix = 0,
  outpop = NULL,
  stop_gen = 100,
  stop_gen2 = 15,
  stop_score = 0,
  stop_sec = NULL,
  initgraph = NULL,
  numgraphs = 10,
  mutfuns = namedList(spr_leaves, spr_all, swap_leaves, move_admixedge_once,
    flipadmix_random, place_root_random, mutate_n),
  opt_worst_residual = FALSE,
  plusminus_generations = 5,
  return_searchtree = FALSE,
  admix_constraints = NULL,
  event_constraints = NULL,
  reject_f4z = 0,
  max_admix = numadmix,
  verbose = TRUE,
  ...
)

Arguments

data

Input data in one of three forms:

  1. A 3d array of blocked f2 statistics, output of f2_from_precomp or f2_from_geno

  2. A directory which contains pre-computed f2-statistics

  3. The prefix of genotype files

numadmix

Number of admixture events within each graph. (Only relevant if initgraph = NULL)

outpop

Name of the outgroup population

stop_gen

Total number of generations after which to stop

stop_gen2

Number of generations without improvement after which to stop

stop_score

Stop once this score has been reached

stop_sec

Number of seconds after which to stop

initgraph

Graph to start with. If it is specified, numadmix and outpop will be inferred from this graph.

numgraphs

Number of graphs in each generation

mutfuns

Functions used to modify graphs. Defaults to the following:

  • spr_leaves: Subtree prune and regraft leaves. Cuts a leaf node and attaches it to a random other edge in the graph.

  • spr_all: Subtree prune and regraft. Cuts any edge and attaches the new orphan node to a random other edge in the graph, keeping the number of admixture nodes constant.

  • swap_leaves: Swaps two leaf nodes.

  • move_admixedge_once: Moves an admixture edge to a nearby location.

  • flipadmix_random: Flips the direction of an admixture edge (if possible).

  • mutate_n: Apply n of the mutation functions in this list to a graph (defaults to 2).

opt_worst_residual

Optimize for lowest worst residual instead of best score. FALSE by default, because the likelihood score is generally a better indicator of the quality of the model fit, and because optimizing for the lowest worst residual is slower (because f4-statistics need to be computed).

plusminus_generations

If the best score does not improve after plusminus_generations generations, another approach to improving the score will be attempted: A number of graphs with on additional admixture edge will be generated and evaluated. The resulting graph with the best score will be picked, and new graphs will be created by removing any one admixture edge (bringing the number back to what it was originally). The graph with the lowest score will then be selected. This often makes it possible to break out of local optima, but is slower than regular graph modifications. If the current number of admixture events is lower than max_numadmix, the last step (removing an admixture edge) will be skipped.

return_searchtree

Return the search tree in addition to the models. Output will be a list with three items: models, search tree, search tree as data frame

admix_constraints

A data frame with constraints on the number of admixture events for each population. See satisfies_numadmix As soon as one graph happens to satisfy these constraints, all subsequently generated graphs will be required to also satisfy them.

event_constraints

A data frame with constraints on the order of events in an admixture graph. See satisfies_eventorder As soon as one graph happens to satisfy these constraints, all subsequently generated graphs will be required to also satisfy them.

reject_f4z

If this is a number greater than zero, all f4-statistics with abs(z) > reject_f4z will be used to constrain the search space of admixture graphs: Any graphs in which f4-statistics greater than reject_f4z are expected to be zero will not be evaluated.

max_admix

Maximum number of admixture edges. By default, this number is equal to numadmix, or to the number of admixture edges in initgraph, so the number of admixture edges stays constant. Setting this to a higher number will lead to more admixture edges being added occasionally (see plusminus_generations). Graphs with additional admixture edges will only be accepted if they improve the score by 5% or more.

verbose

Print progress updates

...

Additional arguments passed to qpgraph

Value

A nested data frame with one model per line

Examples

if (FALSE) {
res = find_graphs(example_f2_blocks, numadmix = 2)
res %>% slice_min(score)
}
if (FALSE) {
# Start with a graph with 0 admixture events, increase up to 3, and stop after 10 generations of no improvement
pops = dimnames(example_f2_blocks)[[1]]
initgraph = random_admixturegraph(pops, 0, outpop = 'Chimp.REF')
res = find_graphs(example_f2_blocks, initgraph = initgraph, stop_gen2 = 10, max_admix = 3)
res %>% slice_min(score)
}