This function generates and evaluates admixture graphs in numgen
iterations across numrep
independent repeats
to find well fitting admixturegraphs. It uses the function future_map
to parallelize across the independent repeats. The function plan
can be called
to specify the details of the parallelization. This can be used to parallelize across cores or across nodes on
a compute cluster. Setting numadmix
to 0 will search for well fitting trees, which is much faster than searching
for admixture graphs with many admixture nodes.
find_graphs_old(
data,
pops = NULL,
outpop = NULL,
numrep = 1,
numgraphs = 50,
numgen = 5,
numsel = 5,
numadmix = 0,
numstart = 1,
keep = c("all", "best", "last"),
initgraphs = NULL,
mutfuns = namedList(spr_leaves, spr_all, swap_leaves, move_admixedge_once,
flipadmix_random, mutate_n),
mutprobs = NULL,
opt_worst_residual = FALSE,
store_intermediate = NULL,
parallel = TRUE,
stop_after = NULL,
verbose = TRUE,
...
)
Input data in one of three forms:
A 3d array of blocked f2 statistics, output of f2_from_precomp
or f2_from_geno
A directory which contains pre-computed f2-statistics
The prefix of genotype files
Populations for which to fit admixture graphs (default all)
An outgroup population which will split at the root from all other populations in all tested graphs. If one of the populations is know to be an outgroup, designating it as outpop
will greatly reduce the search space compared to including it and not designating it as outpop
.
Number of independent repetitions (each repetition can be run in parallel)
Number of graphs in each generation
Number of generations
Number of graphs which are selected in each generation. Should be less than numgraphs
.
Number of admixture events within each graph
Number of random initializations in each call to qpgraph
. Defaults to 1, to speed up the graph optimization.
Which models should be returned. One of all
, best
, last
all
(default): Return all evaluated graphs
best
: Return only the best fitting graph from each repeat and each generation
last
: Return all graphs from the last generation
Optional graph or list of igraphs to start with. If NULL
, optimization will start with random graphs.
Functions used to modify graphs. Defaults to the following:
spr_leaves
: Subtree prune and regraft leaves. Cuts a leaf node and attaches it
to a random other edge in the graph.
spr_all
: Subtree prune and regraft. Cuts any edge and attaches the new orphan node
to a random other edge in the graph, keeping the number of admixture nodes constant.
swap_leaves
: Swaps two leaf nodes.
move_admixedge_once
: Moves an admixture edge to a nearby location.
flipadmix_random
: Flips the direction of an admixture edge (if possible).
mutate_n
: Apply n
of the mutation functions in this list to a graph (defaults to 2).
See examples for how to make new mutation functions.
Relative frequencies of each mutation function.
NULL
(default) means each mutation function is picked with equal probability
A numeric vector of length equal to mutfuns
defines the relative frequency of each mutation function
A matrix of dimensions numgen
x length(mutfuns)
defines the relative frequency of each mutation function in each generation
Optimize for lowest worst residual instead of best score. FALSE
by default, because the likelihood score is generally a better indicator of the quality of the model fit. Optimizing for the lowest worst residual is also slower (because f4-statistics need to be computed).
Path and prefix of files for intermediate results to .rds
. Can be useful if find_graphs_old
doesn't finish sucessfully.
Parallelize over repeats (if numrep > 1
) or graphs (if numrep == 1
) by replacing map
with future_map
. Will only be effective if plan
has been set.
Stop optimization after stop_after
seconds (and after finishing the current generation).
Print progress updates
Additional arguments passed to qpgraph
A nested data frame with one model per line
if (FALSE) {
find_graphs_old(example_f2_blocks, numrep = 200, numgraphs = 100,
numgen = 20, numsel = 5, numadmix = 3)
}
if (FALSE) {
# Making new mutation functions by modifying or combining existing ones:
newfun1 = function(graph, ...) mutate_n(graph, 3, ...)
newfun2 = function(graph, ...) flipadmix_random(spr_leaves(graph, ...), ...)
find_graphs_old(f2_blocks, mutfuns = namedList(spr_leaves, newfun1, newfun2), mutprobs = c(0.2, 0.3, 0.5))
}