Compute per-block f4-statistics directly from genotype data
f4blockdat_from_geno(
pref,
popcombs = NULL,
left = NULL,
right = NULL,
auto_only = TRUE,
blgsize = 0.05,
block_lengths = NULL,
f4mode = TRUE,
allsnps = FALSE,
poly_only = FALSE,
snpwt = NULL,
keepsnps = NULL,
verbose = TRUE
)
Prefix of genotype files
A data frame with one population combination per row, and columns pop1
, pop2
, pop3
, pop4
. If there is an additional integer column named model
and allsnps = FALSE
, only SNPs present in every population in any given model will be used to compute f4-statistics for that model.
Populations on the left side of f4 (pop1
and pop2
). Can be provided together with right
in place of popcombs
.
Populations on the right side of f4 (pop3
and pop4
). Can be provided together with left
in place of popcombs
.
Use only chromosomes 1 to 22.
SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize
is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.
An optional vector with block lengths. If NULL
, block lengths will be computed.
If TRUE
: f4 is computed from allele frequencies a
, b
, c
, and d
as (a-b)*(c-d)
. if FALSE
, D-statistics are computed instead, defined as (a-b)*(c-d) / ((a + b - 2*a*b) * (c + d - 2*c*d))
, which is the same as (P(BABA) - P(ABBA)) / (P(ABBA) + P(BABA))
.
Use all SNPs with allele frequency estimates in every population of any given population quadruple. If FALSE
(the default) only SNPs which are present in all populations in popcombs
(or any given model in it) will be used. Setting allsnps = TRUE
in the presence of large amounts of missing data might lead to false positive results.
Only keep SNPs with mean allele frequency not equal to 0 or 1 (default FALSE
).
A vector of SNP weights
A vector of SNP IDs to keep
Print progress updates
A data frame with per-block f4-statistics for each population quadruple.