qpwave
compares two sets of populations (left
and right
) to each other. It estimates a lower bound on the number of admixtue waves that went from left
into right
, by comparing a matrix of f4-statistics to low-rank approximations. For a rank of 0 this is equivalent to testing whether left
and right
form clades relative to each other.
qpwave(
data,
left,
right,
fudge = 1e-04,
auto_only = TRUE,
blgsize = 0.05,
poly_only = FALSE,
boot = FALSE,
constrained = FALSE,
cpp = TRUE,
verbose = TRUE
)
The input data in the form of:
A 3d array of blocked f2 statistics, output of f2_from_precomp
or extract_f2
A directory with f2 statistics
The prefix of a genotype file
Left populations (sources)
Right populations (outgroups)
Value added to diagonal matrix elements before inverting
Use only chromosomes 1 to 22.
SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize
is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.
Exclude sites with identical allele frequencies in all populations.
If FALSE
(the default), block-jackknife resampling will be used to compute standard errors.
Otherwise, block-bootstrap resampling will be used to compute standard errors. If boot
is an integer, that number
will specify the number of bootstrap resamplings. If boot = TRUE
, the number of bootstrap resamplings will be
equal to the number of SNP blocks.
Constrain admixture weights to be non-negative
Use C++ functions. Setting this to FALSE
will be slower but can help with debugging.
Print progress updates
qpwave
returns a list with up to two data frames describing the model fit:
f4
A data frame with estimated f4-statistics
rankdrop
: A data frame describing model fits with different ranks, including p-values for the overall fit
and for nested models (comparing two models with rank difference of one). A model with L
left populations and R
right populations has an f4-matrix of dimensions (L-1)*(R-1)
. If no two left population form a clade with respect to all right populations, this model will have rank (L-1)*(R-1)
.
f4rank
: Tested rank
dof
: Degrees of freedom of the chi-squared null distribution: (L-1-f4rank)*(R-1-f4rank)
chisq
: Chi-sqaured statistic, obtained as E'QE
, where E
is the difference between estimated and fitted f4-statistics, and Q
is the f4-statistic covariance matrix.
p
: p-value obtained from chisq
as pchisq(chisq, df = dof, lower.tail = FALSE)
dofdiff
: Difference in degrees of freedom between this model and the model with one less rank
chisqdiff
: Difference in chi-squared statistics
p_nested
: p-value testing whether the difference between two models of rank difference 1 is significant
Patterson, N. et al. (2012) Ancient admixture in human history. Genetics
Haak, W. et al. (2015) Massive migration from the steppe was a source for Indo-European languages in Europe. Nature (SI 10)
left = c('Altai_Neanderthal.DG', 'Vindija.DG')
right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG')
qpwave(example_f2_blocks, left, right)
#> ℹ Computing f4 stats...
#> ℹ Computing number of admixture waves...
#>
#> $f4
#> # A tibble: 3 × 8
#> pop1 pop2 pop3 pop4 est se z p
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Altai_Neanderthal.DG Vindija.DG Chimp.REF Mbuti… 1.24e-4 1.35e-4 0.920 0.358
#> 2 Altai_Neanderthal.DG Vindija.DG Chimp.REF Russi… 4.45e-4 1.64e-4 2.72 0.00653
#> 3 Altai_Neanderthal.DG Vindija.DG Chimp.REF Switz… 4.22e-4 1.72e-4 2.45 0.0144
#>
#> $rankdrop
#> # A tibble: 1 × 7
#> f4rank dof chisq p dofdiff chisqdiff p_nested
#> <int> <int> <dbl> <dbl> <int> <dbl> <dbl>
#> 1 0 3 11.9 0.00768 NA NA NA
#>