This function computes pairwise Fst from genotype files, or from precomputed per-block Fst-statistics (not to be confused with per-block f2-statistics, which are used by most other functions). See details for how Fst is computed.

fst(data, pop1 = NULL, pop2 = NULL, boot = FALSE, verbose = FALSE, ...)

Arguments

data

Input data in one of three forms:

  1. The prefix of genotype files

  2. A directory with pre-computed Fst .rds files generated by extract_f2

  3. A 3d array of blocked Fst, output of f2_from_precomp with fst = TRUE. This is not recommended, since pre-computed Fst blocks are not used by any other functions, and they can be mixed up with the more commonly used pre-computed f2 blocks.

pop1

One of the following four:

  1. NULL: all possible population combinations will be returned

  2. A vector of population labels. All combinations with the other pop arguments will be returned

  3. A matrix with population combinations to be tested, with one population per column and one combination per row. Other pop arguments will be ignored.

  4. the location of a file (poplistname or popfilename) which specifies the populations or population combinations to be tested. Other pop arguments will be ignored.

pop2

A vector of population labels

boot

If FALSE (the default), block-jackknife resampling will be used to compute standard errors. Otherwise, block-bootstrap resampling will be used to compute standard errors. If boot is an integer, that number will specify the number of bootstrap resamplings. If boot = TRUE, the number of bootstrap resamplings will be equal to the number of SNP blocks.

verbose

Print progress updates

...

Additional arguments passed to f2_from_geno when data is a genotype prefix

Details

The Hudson Fst estimator used here is described in the two publications below. For two populations with estimated allele frequency vectors p1 and p2, and allele count vectors n1 and n2, it is calculated as follows:

num = (p1 - p2)^2 - p1*(1-p1)/(n1-1) - p2*(1-p2)/(n2-1)
denom = p1 + p2 - 2*p1*p2
fst = mean(num)/mean(denom)

This is done independently for each SNP block, and is stored on disk for each population pair. Jackknifing or bootstrapping across these per-block estimates yields the overall estimates and standard errors.

References

Reich, D. (2009) Reconstructing Indian population history Nature

Bhatia, G. (2013) Estimating and interpreting Fst: the impact of rare variants Genome Research

Examples

if (FALSE) {
pop1 = 'Denisova.DG'
pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG')
fst(f2_dir, pop1, pop2)
}