Takes a 3d array of f2 block jackknife estimates and computes f3-statistics between the first population p1 and all population pairs i,j: f3(p1;pi,pj)

qpgraph_precompute_f3(
  data,
  pops,
  f3basepop = NULL,
  lambdascale = 1,
  boot = FALSE,
  seed = NULL,
  diag_f3 = 1e-05,
  lsqmode = FALSE
)

Arguments

data

Input data in one of three forms:

  1. A 3d array of blocked f2 statistics, output of f2_from_precomp or extract_f2 (fastest option)

  2. A directory which contains pre-computed f2-statistics

  3. The prefix of genotype files (slowest option)

pops

Populations for which to compute f3-statistics

f3basepop

f3-statistics base population. If NULL (the default), the first population in pops will be used as the basis.

lambdascale

Scales f2-statistics. This has no effect on the fit, but is used in the original qpGraph program to display branch weights on a scale that corresponds to FST distances.

boot

If FALSE (the default), block-jackknife resampling will be used to compute standard errors. Otherwise, block-bootstrap resampling will be used to compute standard errors. If boot is an integer, that number will specify the number of bootstrap resamplings. If boot = TRUE, the number of bootstrap resamplings will be equal to the number of SNP blocks.

seed

Random seed used if boot is TRUE.

diag_f3

Regularization term added to the diagonal elements of the covariance matrix of estimated f3 statistics (after scaling by the matrix trace). In the original qpGraph program, this is fixed at 0.00001.

lsqmode

Least-squares mode. If TRUE, the likelihood score will be computed using a diagonal matrix with 1/(sum(diag(f3_var)) * diag_f3), in place of the inverse f3-statistic covariance matrix. lsqmode = 2 will use the identity matrix instead, which is equivalent to computing the score as the sum of squared residuals (sum((f3_est-f3_fit)^2)). Both of these options do not take the covariance of f3-statistics into account. This can lead to bias, but is more stable in cases where the inverse f3-statistics covariance matrix can not be estimated precisely (for example because the number of populations is large). An alternative to using lsqmode = TRUE which doesn't completely ignore the covariance of f3-statistics is to increase diag_f3.

Value

A list with four items

  1. f3_est a matrix with f3-statistics for all population pairs with the output

  2. ppinv a matrix with the inverse of the f3-statistic covariance matrix

  3. f2out a data frame with f2 estimates

  4. f3out a data frame with f3 estimates

Examples

pops = get_leafnames(example_igraph)
qpgraph_precompute_f3(example_f2_blocks, pops)$f3_est
#>  [1] 0.2249564 0.2202068 0.1235151 0.1431551 0.1251066 0.1250159 0.2284884
#>  [8] 0.1236390 0.1428426 0.1255514 0.1254377 0.2051343 0.1105677 0.1933206
#> [15] 0.1946408 0.2022054 0.1099106 0.1100366 0.2568586 0.2526432 0.4378706
#> attr(,"pops")
#> [1] "Chimp.REF"             "Altai_Neanderthal.DG"  "Vindija.DG"           
#> [4] "Mbuti.DG"              "Denisova.DG"           "Russia_Ust_Ishim.DG"  
#> [7] "Switzerland_Bichon.SG"
if (FALSE) {
qpgraph_precompute_f3(f2_dir, pops)
}