This function generates and evaluates admixture graphs in numgen
iterations
to find well fitting admixturegraphs.
find_graphs(
data,
numadmix = 0,
outpop = NULL,
stop_gen = 100,
stop_gen2 = 15,
stop_score = 0,
stop_sec = NULL,
initgraph = NULL,
numgraphs = 10,
mutfuns = namedList(spr_leaves, spr_all, swap_leaves, move_admixedge_once,
flipadmix_random, place_root_random, mutate_n),
opt_worst_residual = FALSE,
plusminus_generations = 5,
return_searchtree = FALSE,
admix_constraints = NULL,
event_constraints = NULL,
reject_f4z = 0,
max_admix = numadmix,
verbose = TRUE,
...
)
Input data in one of three forms:
A 3d array of blocked f2 statistics, output of f2_from_precomp
or f2_from_geno
A directory which contains pre-computed f2-statistics
The prefix of genotype files
Number of admixture events within each graph. (Only relevant if initgraph = NULL
)
Name of the outgroup population
Total number of generations after which to stop
Number of generations without improvement after which to stop
Stop once this score has been reached
Number of seconds after which to stop
Graph to start with. If it is specified, numadmix
and outpop
will be inferred from this graph.
Number of graphs in each generation
Functions used to modify graphs. Defaults to the following:
spr_leaves
: Subtree prune and regraft leaves. Cuts a leaf node and attaches it
to a random other edge in the graph.
spr_all
: Subtree prune and regraft. Cuts any edge and attaches the new orphan node
to a random other edge in the graph, keeping the number of admixture nodes constant.
swap_leaves
: Swaps two leaf nodes.
move_admixedge_once
: Moves an admixture edge to a nearby location.
flipadmix_random
: Flips the direction of an admixture edge (if possible).
mutate_n
: Apply n
of the mutation functions in this list to a graph (defaults to 2).
Optimize for lowest worst residual instead of best score. FALSE
by default, because the likelihood score is generally a better indicator of the quality of the model fit, and because optimizing for the lowest worst residual is slower (because f4-statistics need to be computed).
If the best score does not improve after plusminus_generations
generations, another approach to improving the score will be attempted: A number of graphs with on additional admixture edge will be generated and evaluated. The resulting graph with the best score will be picked, and new graphs will be created by removing any one admixture edge (bringing the number back to what it was originally). The graph with the lowest score will then be selected. This often makes it possible to break out of local optima, but is slower than regular graph modifications.
If the current number of admixture events is lower than max_numadmix
, the last step (removing an admixture edge) will be skipped.
Return the search tree in addition to the models. Output will be a list with three items: models, search tree, search tree as data frame
A data frame with constraints on the number of admixture events for each population.
See satisfies_numadmix
As soon as one graph happens to satisfy these constraints, all subsequently generated graphs will be required to also satisfy them.
A data frame with constraints on the order of events in an admixture graph.
See satisfies_eventorder
As soon as one graph happens to satisfy these constraints, all subsequently generated graphs will be required to also satisfy them.
If this is a number greater than zero, all f4-statistics with abs(z) > reject_f4z
will be used to constrain the search space of admixture graphs: Any graphs in which f4-statistics greater than reject_f4z
are expected to be zero will not be evaluated.
Maximum number of admixture edges. By default, this number is equal to numadmix
, or to the number of admixture edges in initgraph
, so the number of admixture edges stays constant. Setting this to a higher number will lead to more admixture edges being added occasionally (see plusminus_generations
). Graphs with additional admixture edges will only be accepted if they improve the score by 5% or more.
Print progress updates
Additional arguments passed to qpgraph
A nested data frame with one model per line
if (FALSE) {
res = find_graphs(example_f2_blocks, numadmix = 2)
res %>% slice_min(score)
}
if (FALSE) {
# Start with a graph with 0 admixture events, increase up to 3, and stop after 10 generations of no improvement
pops = dimnames(example_f2_blocks)[[1]]
initgraph = random_admixturegraph(pops, 0, outpop = 'Chimp.REF')
res = find_graphs(example_f2_blocks, initgraph = initgraph, stop_gen2 = 10, max_admix = 3)
res %>% slice_min(score)
}