This is intended for computing f2-statistics for a large number of populations,
too many to do everything in working memory.
It assumes that the allele frequencies have already been computed and are stored in .rds
files,
split into consecutive blocks for a set of populations. This function calls write_f2
,
which takes a (sub-)chunk of pairwise f2-statistics, and writes one pair at a time to disk.
afs_to_f2(
afdir,
outdir,
chunk1,
chunk2,
blgsize = 0.05,
snpwt = NULL,
overwrite = FALSE,
type = "f2",
poly_only = FALSE,
snpdat = NULL,
apply_corr = TRUE,
verbose = TRUE
)
Directory with allele frequency and counts .rds
files created by split_mat
Directory where data will be stored
Index of the first chunk of populations
Index of the second chunk of populations
SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize
is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.
A vector of scaling factors applied to the f2-statistics for each SNP. The length has to match the number of SNPs.
Overwrite existing files (default FALSE
)
Print progress updates
extract_f2
Does the same thing in one step for smaller data.
if (FALSE) {
afdir = 'tmp_af_dir/'
f2dir = 'f2_dir'
extract_afs('path/to/packedancestrymap_prefix', afdir)
numchunks = length(list.files(afdir, 'afs.+rds'))
# numchunks should be the number of split allele frequency files
for(i in 1:numchunks) {
for(j in i:numchunks) {
afs_to_f2(afdir, f2dir, chunk1 = i, chunk2 = j)
}
}
}
# Alternatively, the following code will do the same, while submitting each chunk as a separate job.
# (if \code{\link[future]{plan}} has been set up appropriately)
if (FALSE) {
furrr::future_map(1:numchunks, ~{i=.; map(i:numchunks, ~{
afs_to_f2(afdir, f2dir, chunk1 = i, chunk2 = .)
})})
}