Computes f3 statistics of the form \(f3(A; B, C)\). This is generally equivalent to \((f2(A, B) + f2(A, C) - f2(B, C)) / 2\) and to \(f4(A, B; A, C)\). However, the exact estimates depend on the type of input data, selection of SNPs, and on the values of some of the arguments, which are described below.
qp3pop(
data,
pop1 = NULL,
pop2 = NULL,
pop3 = NULL,
boot = FALSE,
sure = FALSE,
unique_only = TRUE,
blgsize = NULL,
block_lengths = NULL,
verbose = FALSE,
...
)
Input data in one of three forms:
A 3d array of blocked f2 statistics, output of f2_from_precomp
or f2_from_geno
(fastest option)
A directory which contains pre-computed f2-statistics
The prefix of genotype files (slowest option)
One of the following four:
NULL
: all possible population combinations will be returned
A vector of population labels. All combinations with the other pop
arguments will be returned
A matrix with population combinations to be tested, with one population per column and one
combination per row. Other pop
arguments will be ignored.
the location of a file (poplistname
or popfilename
) which specifies the populations or
population combinations to be tested. Other pop
arguments will be ignored.
A vector of population labels
A vector of population labels
If FALSE
(the default), block-jackknife resampling will be used to compute standard errors.
Otherwise, block-bootstrap resampling will be used to compute standard errors. If boot
is an integer, that number
will specify the number of bootstrap resamplings. If boot = TRUE
, the number of bootstrap resamplings will be
equal to the number of SNP blocks.
The number of population combinations can get very large. This is a safety option that stops you from accidently computing all combinations if that number is large.
If TRUE
(the default), redundant combinations will be excluded
Print progress updates
Additional arguments passed to f3blockdat_from_geno
if data
is a genotype prefix, or to get_f2
otherwise
With apply_corr = FALSE
, no bias correction is performed. With apply_corr = TRUE
(the default), a bias correction term based on the heterozygosity in the first population is subtracted from the f3 estimate. The option is ineffective if the first argument is an array of pre-computed f2-statistics. In that case, the bias correction can be toggled by the apply_corr
parameter in extract_f2
or f2_from_geno
. The heterozygosity calculation used in apply_corr = TRUE
requires at least two haploid samples or one diploid sample in the first population. Otherwise, apply_corr = TRUE
will result in missing values.
With outgroupmode = FALSE
, estimates of f3 will be normalized by estimates of the heterozygosity of the target population. This is the default option if the first argument is the prefix of genotype data. If the first argument is an array of pre-computed f2-statistics, then no normalization can be performed, which corresponds to outgroupmode = TRUE
. As with apply_corr = TRUE
, the heterozygosity calculation used in outgroupmode = FALSE
requires at least two pseudodiploid samples or one diploid sample in the first population. Otherwise, outgroupmode = FALSE
will result in missing values.
Only keep SNPs with mean allele frequency not equal to 0 or 1. Defaults to FALSE
. Only effective if the first argument is the prefix of genotype data. If the first argument is an array of pre-computed f2-statistics, use the poly_only
argument in extract_f2
or f2_from_geno
qp3pop
returns a data frame with f3 statistics, with columns for populations 1 to 3,
f3-estimate (est
), standard error of the estimate (se
), z-score (z
, est
/se
), p-value (2*(1-pnorm(z))
),
and the number of SNPs used (n
; only if first argument is genotype prefix)
The default values of the parameters apply_corr
, outgroupmode
, poly_only
depend on the first argument (data
), and they can affect the estimated f3-statistics. When the data
is the prefix of genotype data, the default parameters are generally the same as in the original qp3pop program. When data
is an array of pre-computed f2-statistics, the parameters may be set while computing f2-statistics. If the first population is a single pseudodiploid sample, it is not possible to get unbiased estimates of f3. To get biased estimates for a single pseudodiploid sample from genotype data, set outgroupmode = TRUE
and apply_corr = FALSE
. Under this combination of parameters, f3(A; B, C)
should also be identical to f4(A, B; A, C)
, since f4-statistics generally do not require any bias correction.
See examples
for more information.
f3
Patterson, N. et al. (2012) Ancient admixture in human history Genetics
Peter, B. (2016) Admixture, Population Structure, and F-Statistics Genetics
if (FALSE) {
pop1 = 'Denisova.DG'
pop2 = c('Altai_Neanderthal.DG', 'Vindija.DG')
pop3 = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG')
qp3pop(example_f2_blocks, pop1, pop2, pop3)
pops = c('Chimp.REF', 'Mbuti.DG', 'Russia_Ust_Ishim.DG')
qp3pop(example_f2_blocks, pops)
qp3pop(example_f2_blocks, pops, unique_only = FALSE)
qp3pop(f2_dir, pop1, pop2, pop3)
# Below are three scenarios, and in each one `qp3pop()` and `qp3pop_wrapper()`
# should give the same or very similar estimates. Note that to compute `f3(A; B, C)`,
# `qp3pop_wrapper()`, and the original qp3pop program, expect populations to be in the order `B`, `C`, `A`.
prefix = '/path/to/geno/prefix'
qp3popbin = '/path/to/AdmixTools/bin/qp3Pop'
pops = dimnames(example_f2_blocks)[[1]]
# 1. target diploid, outgroupmode NO (this is the default when passing a geno file prefix)
qp3pop_wrapper(prefix, pops[2], pops[3], pops[1], bin = qp3popbin, outgroupmode = FALSE)
qp3pop(prefix, pops[1], pops[2], pops[3])
# est, se, z match well; n is higher
qp3pop(prefix, pops[1], pops[2], pops[3], poly_only = TRUE)
# est, se, z match less well; n is identical
# 2. target diploid, outgroupmode YES (this is the only option with precomputed f2-stats)
qp3pop_wrapper(prefix, pops[2], pops[3], pops[1], bin = qp3popbin, outgroupmode = TRUE)
qp3pop(prefix, pops[1], pops[2], pops[3], outgroupmode = TRUE)
# est, se, z match (except for factor 1000)
f2b = f2_from_geno(prefix, pops = pops[1:3], poly_only = FALSE)
qp3pop(f2b, pops[1], pops[2], pops[3])
# 3. target pseudodiploid (no heterozygotes means heterozygosity rate correction is not possible)
qp3pop_wrapper(prefix, pops[1], pops[3], pops[2], bin = qp3popbin, outgroupmode = TRUE)
qp3pop(prefix, pops[2], pops[1], pops[3], outgroupmode = TRUE, apply_corr = FALSE)
# est, se, z match (except for factor 1000)
}