Compute per-block f3-statistics directly from genotype data
f3blockdat_from_geno(
pref,
popcombs,
auto_only = TRUE,
blgsize = 0.05,
block_lengths = NULL,
allsnps = FALSE,
adjust_pseudohaploid = TRUE,
poly_only = FALSE,
apply_corr = TRUE,
outgroupmode = FALSE,
verbose = TRUE
)
Prefix of genotype files
A data frame with one population combination per row, and columns pop1
, pop2
, pop3
, pop4
. If there is an additional integer column named model
and allsnps = FALSE
, only SNPs present in every population in any given model will be used to compute f4-statistics for that model.
Use only chromosomes 1 to 22.
SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize
is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.
An optional vector with block lengths. If NULL
, block lengths will be computed.
Use all SNPs with allele frequency estimates in every population of any given population quadruple. If FALSE
(the default) only SNPs which are present in all populations in popcombs
(or any given model in it) will be used. Setting allsnps = TRUE
in the presence of large amounts of missing data might lead to false positive results.
Genotypes of pseudohaploid samples are usually coded as 0
or 2
, even though only one allele is observed. adjust_pseudohaploid
ensures that the observed allele count increases only by 1
for each pseudohaploid sample. If TRUE
(default), samples that don't have any genotypes coded as 1
among the first 1000 SNPs are automatically identified as pseudohaploid. This leads to slightly more accurate estimates of f-statistics. Setting this parameter to FALSE
is equivalent to the ADMIXTOOLS inbreed: NO
option. Setting adjust_pseudohaploid
to an integer n
will check the first n
SNPs instead of the first 1000 SNPs.
With apply_corr = FALSE
, no bias correction is performed. With apply_corr = TRUE
(the default), a bias correction term based on the heterozygosity in the first population is subtracted from the f3 estimate. With apply_corr = 2
, the bias correction term is calculated based on all 3 populations. This option is not generally recommended, and only exists to match how the f3-statistics are estimated in certain scenarios in the original qpGraph program.
With outgroupmode = FALSE
, estimates of f3 will be normalized by estimates of the heterozygosity of the target population. This is the default option if the first argument is the prefix of genotype data. If the first argument is an array of precomputed f2-statistics, then no normalization can be performed, which corresponds to outgroupmode = TRUE
.
Print progress updates
A data frame with per-block f4-statistics for each population quadruple.