Compute per-block f3-statistics directly from genotype data

f3blockdat_from_geno(
  pref,
  popcombs,
  auto_only = TRUE,
  blgsize = 0.05,
  block_lengths = NULL,
  allsnps = FALSE,
  adjust_pseudohaploid = TRUE,
  poly_only = FALSE,
  apply_corr = TRUE,
  outgroupmode = FALSE,
  verbose = TRUE
)

Arguments

pref

Prefix of genotype files

popcombs

A data frame with one population combination per row, and columns pop1, pop2, pop3, pop4. If there is an additional integer column named model and allsnps = FALSE, only SNPs present in every population in any given model will be used to compute f4-statistics for that model.

auto_only

Use only chromosomes 1 to 22.

blgsize

SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.

block_lengths

An optional vector with block lengths. If NULL, block lengths will be computed.

allsnps

Use all SNPs with allele frequency estimates in every population of any given population quadruple. If FALSE (the default) only SNPs which are present in all populations in popcombs (or any given model in it) will be used. Setting allsnps = TRUE in the presence of large amounts of missing data might lead to false positive results.

adjust_pseudohaploid

Genotypes of pseudohaploid samples are usually coded as 0 or 2, even though only one allele is observed. adjust_pseudohaploid ensures that the observed allele count increases only by 1 for each pseudohaploid sample. If TRUE (default), samples that don't have any genotypes coded as 1 among the first 1000 SNPs are automatically identified as pseudohaploid. This leads to slightly more accurate estimates of f-statistics. Setting this parameter to FALSE is equivalent to the ADMIXTOOLS inbreed: NO option. Setting adjust_pseudohaploid to an integer n will check the first n SNPs instead of the first 1000 SNPs.

apply_corr

With apply_corr = FALSE, no bias correction is performed. With apply_corr = TRUE (the default), a bias correction term based on the heterozygosity in the first population is subtracted from the f3 estimate. With apply_corr = 2, the bias correction term is calculated based on all 3 populations. This option is not generally recommended, and only exists to match how the f3-statistics are estimated in certain scenarios in the original qpGraph program.

outgroupmode

With outgroupmode = FALSE, estimates of f3 will be normalized by estimates of the heterozygosity of the target population. This is the default option if the first argument is the prefix of genotype data. If the first argument is an array of precomputed f2-statistics, then no normalization can be performed, which corresponds to outgroupmode = TRUE.

verbose

Print progress updates

Value

A data frame with per-block f4-statistics for each population quadruple.