Estimate admixture waves

qpwave compares two sets of populations (left and right) to each other. It estimates a lower bound on the number of admixtue waves that went from left into right, by comparing a matrix of f4-statistics to low-rank approximations. For a rank of 0 this is equivalent to testing whether left and right form clades relative to each other.

qpwave(
  data,
  left,
  right,
  fudge = 1e-04,
  auto_only = TRUE,
  blgsize = 0.05,
  poly_only = FALSE,
  boot = FALSE,
  constrained = FALSE,
  cpp = TRUE,
  verbose = TRUE
)

Arguments

data

The input data in the form of:

A 3d array of blocked f2 statistics, output of f2_from_precomp or extract_f2
A directory with f2 statistics
The prefix of a genotype file

left

Left populations (sources)

right

Right populations (outgroups)

fudge

Value added to diagonal matrix elements before inverting

auto_only

Use only chromosomes 1 to 22.

blgsize

SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.

poly_only

Exclude sites with identical allele frequencies in all populations.

boot

If FALSE (the default), block-jackknife resampling will be used to compute standard errors. Otherwise, block-bootstrap resampling will be used to compute standard errors. If boot is an integer, that number will specify the number of bootstrap resamplings. If boot = TRUE, the number of bootstrap resamplings will be equal to the number of SNP blocks.

constrained

Constrain admixture weights to be non-negative

cpp

Use C++ functions. Setting this to FALSE will be slower but can help with debugging.

verbose

Print progress updates

Value

qpwave returns a list with up to two data frames describing the model fit:

f4 A data frame with estimated f4-statistics
rankdrop: A data frame describing model fits with different ranks, including p-values for the overall fit and for nested models (comparing two models with rank difference of one). A model with L left populations and R right populations has an f4-matrix of dimensions (L-1)*(R-1). If no two left population form a clade with respect to all right populations, this model will have rank (L-1)*(R-1).
- f4rank: Tested rank
- dof: Degrees of freedom of the chi-squared null distribution: (L-1-f4rank)*(R-1-f4rank)
- chisq: Chi-sqaured statistic, obtained as E'QE, where E is the difference between estimated and fitted f4-statistics, and Q is the f4-statistic covariance matrix.
- p: p-value obtained from chisq as pchisq(chisq, df = dof, lower.tail = FALSE)
- dofdiff: Difference in degrees of freedom between this model and the model with one less rank
- chisqdiff: Difference in chi-squared statistics
- p_nested: p-value testing whether the difference between two models of rank difference 1 is significant

References

Patterson, N. et al. (2012) Ancient admixture in human history. Genetics

Haak, W. et al. (2015) Massive migration from the steppe was a source for Indo-European languages in Europe. Nature (SI 10)

Examples

left = c('Altai_Neanderthal.DG', 'Vindija.DG')
right = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG', 'Switzerland_Bichon.SG')
qpwave(example_f2_blocks, left, right)
#> ℹ Computing f4 stats...
#> ℹ Computing number of admixture waves...
#> 
#> $f4
#> # A tibble: 3 × 8
#>   pop1                 pop2       pop3      pop4       est      se     z       p
#>   <chr>                <chr>      <chr>     <chr>    <dbl>   <dbl> <dbl>   <dbl>
#> 1 Altai_Neanderthal.DG Vindija.DG Chimp.REF Mbuti… 1.24e-4 1.35e-4 0.920 0.358  
#> 2 Altai_Neanderthal.DG Vindija.DG Chimp.REF Russi… 4.45e-4 1.64e-4 2.72  0.00653
#> 3 Altai_Neanderthal.DG Vindija.DG Chimp.REF Switz… 4.22e-4 1.72e-4 2.45  0.0144 
#> 
#> $rankdrop
#> # A tibble: 1 × 7
#>   f4rank   dof chisq       p dofdiff chisqdiff p_nested
#>    <int> <int> <dbl>   <dbl>   <int>     <dbl>    <dbl>
#> 1      0     3  11.9 0.00768      NA        NA       NA
#>

Arguments

Value

References

See also

Examples