Takes a 3d array of f2 block jackknife estimates and computes f3-statistics between the first population \(p1\) and all population pairs \(i, j\): \(f3(p1; p_i, p_j)\)
qpgraph_precompute_f3(
data,
pops,
f3basepop = NULL,
lambdascale = 1,
boot = FALSE,
seed = NULL,
diag_f3 = 1e-05,
lsqmode = FALSE
)
Input data in one of three forms:
A 3d array of blocked f2 statistics, output of f2_from_precomp
or extract_f2
(fastest option)
A directory which contains pre-computed f2-statistics
The prefix of genotype files (slowest option)
Populations for which to compute f3-statistics
f3-statistics base population. If NULL
(the default),
the first population in pops
will be used as the basis.
Scales f2-statistics. This has no effect on the fit, but is used in the original qpGraph program to display branch weights on a scale that corresponds to FST distances.
If FALSE
(the default), block-jackknife resampling will be used to compute standard errors.
Otherwise, block-bootstrap resampling will be used to compute standard errors. If boot
is an integer, that number
will specify the number of bootstrap resamplings. If boot = TRUE
, the number of bootstrap resamplings will be
equal to the number of SNP blocks.
Random seed used if boot
is TRUE
.
Regularization term added to the diagonal elements of the covariance matrix of estimated f3 statistics (after scaling by the matrix trace). In the original qpGraph program, this is fixed at 0.00001.
Least-squares mode. If TRUE
, the likelihood score will be computed using a diagonal matrix with 1/(sum(diag(f3_var)) * diag_f3)
, in place of the inverse f3-statistic covariance matrix. lsqmode = 2
will use the identity matrix instead, which is equivalent to computing the score as the sum of squared residuals (sum((f3_est-f3_fit)^2)
). Both of these options do not take the covariance of f3-statistics into account. This can lead to bias, but is more stable in cases where the inverse f3-statistics covariance matrix can not be estimated precisely (for example because the number of populations is large). An alternative to using lsqmode = TRUE
which doesn't completely ignore the covariance of f3-statistics is to increase diag_f3
.
A list with four items
f3_est
a matrix with f3-statistics for all population pairs with the output
ppinv
a matrix with the inverse of the f3-statistic covariance matrix
f2out
a data frame with f2 estimates
f3out
a data frame with f3 estimates
pops = get_leafnames(example_igraph)
qpgraph_precompute_f3(example_f2_blocks, pops)$f3_est
#> [1] 0.2249564 0.2202068 0.1235151 0.1431551 0.1251066 0.1250159 0.2284884
#> [8] 0.1236390 0.1428426 0.1255514 0.1254377 0.2051343 0.1105677 0.1933206
#> [15] 0.1946408 0.2022054 0.1099106 0.1100366 0.2568586 0.2526432 0.4378706
#> attr(,"pops")
#> [1] "Chimp.REF" "Altai_Neanderthal.DG" "Vindija.DG"
#> [4] "Mbuti.DG" "Denisova.DG" "Russia_Ust_Ishim.DG"
#> [7] "Switzerland_Bichon.SG"
if (FALSE) {
qpgraph_precompute_f3(f2_dir, pops)
}