This is intended for computing f2-statistics for a large number of populations, too many to do everything in working memory. It assumes that the allele frequencies have already been computed and are stored in .rds files, split into consecutive blocks for a set of populations. This function calls write_f2, which takes a (sub-)chunk of pairwise f2-statistics, and writes one pair at a time to disk.

afs_to_f2(
  afdir,
  outdir,
  chunk1,
  chunk2,
  blgsize = 0.05,
  snpwt = NULL,
  overwrite = FALSE,
  type = "f2",
  poly_only = FALSE,
  snpdat = NULL,
  apply_corr = TRUE,
  verbose = TRUE
)

Arguments

afdir

Directory with allele frequency and counts .rds files created by split_mat

outdir

Directory where data will be stored

chunk1

Index of the first chunk of populations

chunk2

Index of the second chunk of populations

blgsize

SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.

snpwt

A vector of scaling factors applied to the f2-statistics for each SNP. The length has to match the number of SNPs.

overwrite

Overwrite existing files (default FALSE)

verbose

Print progress updates

See also

extract_f2 Does the same thing in one step for smaller data.

Examples

if (FALSE) {
afdir = 'tmp_af_dir/'
f2dir = 'f2_dir'
extract_afs('path/to/packedancestrymap_prefix', afdir)
numchunks = length(list.files(afdir, 'afs.+rds'))
# numchunks should be the number of split allele frequency files
for(i in 1:numchunks) {
  for(j in i:numchunks) {
    afs_to_f2(afdir, f2dir, chunk1 = i, chunk2 = j)
  }
}
}
# Alternatively, the following code will do the same, while submitting each chunk as a separate job.
# (if \code{\link[future]{plan}} has been set up appropriately)
if (FALSE) {
furrr::future_map(1:numchunks, ~{i=.; map(i:numchunks, ~{
  afs_to_f2(afdir, f2dir, chunk1 = i, chunk2 = .)
  })})
  }