This function reads Fst from a directory with precomputed f2-statistics, and turns per-block data into estimates and standard errors for each population pair. See details for how Fst is computed.

fst(data, pop1 = NULL, pop2 = NULL, boot = FALSE, verbose = FALSE, ...)

Arguments

data

Input data in one of three forms:

  1. A 3d array of blocked Fst, output of f2_from_precomp with fst = TRUE

  2. A directory which contains pre-computed Fst

  3. The prefix of genotype files

pop1

One of the following four:

  1. NULL: all possible population combinations will be returned

  2. A vector of population labels. All combinations with the other pop arguments will be returned

  3. A matrix with population combinations to be tested, with one population per column and one combination per row. Other pop arguments will be ignored.

  4. the location of a file (poplistname or popfilename) which specifies the populations or population combinations to be tested. Other pop arguments will be ignored.

pop2

A vector of population labels

boot

If FALSE (the default), block-jackknife resampling will be used to compute standard errors. Otherwise, block-bootstrap resampling will be used to compute standard errors. If boot is an integer, that number will specify the number of bootstrap resamplings. If boot = TRUE, the number of bootstrap resamplings will be equal to the number of SNP blocks.

verbose

Print progress updates

...

Additional arguments passed to f2_from_geno when data is a genotype prefix

Details

The Hudson Fst estimator used here is described in the two publications below. For two populations with estimated allele frequency vectors p1 and p2, and allele count vectors n1 and n2, it is calculated as follows:

num = (p1 - p2)^2 - p1*(1-p1)/(n1-1) - p2*(1-p2)/(n2-1)
denom = p1 + p2 - 2*p1*p2
fst = mean(num)/mean(denom)

This is done independently for each SNP block, and is stored on disk for each population pair. Jackknifing or bootstrapping across these per-block estimates yields the overall estimates and standard errors.

References

Reich, D. (2009) Reconstructing Indian population history Nature

Bhatia, G. (2013) Estimating and interpreting Fst: the impact of rare variants Genome Research

Examples

if (FALSE) {
pop1 = 'Denisova.DG'
pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG')
fst(f2_dir, pop1, pop2)
}