Prepare data for various ADMIXTOOLS 2 functions. This function reads data from genotype files,
and extracts data required to compute blocked f-statistics for any sets of samples. The data consists of
.rds
files with total and alternative allele counts for each individual, and products of total
and alternative allele counts for each pair.
The function calls packedancestrymap_to_afs
or plink_to_afs
and afs_to_f2_blocks
.
extract_counts(
pref,
outdir,
inds = NULL,
blgsize = 0.05,
maxmiss = 0,
minmaf = 0,
maxmaf = 0.5,
transitions = TRUE,
transversions = TRUE,
auto_only = TRUE,
keepsnps = NULL,
maxmem = 8000,
overwrite = FALSE,
format = NULL,
cols_per_chunk = NULL,
verbose = TRUE
)
Prefix of PLINK/EIGENSTRAT/PACKEDANCESTRYMAP files.
EIGENSTRAT/PACKEDANCESTRYMAP have to end in .geno
, .snp
, .ind
, PLINK has to end in .bed
, .bim
, .fam
Directory where data will be stored.
Individuals for which data should be read. Defaults to all individuals
SNP block size in Morgan. Default is 0.05 (5 cM). If blgsize
is 100 or greater, if will be interpreted as base pair distance rather than centimorgan distance.
Discard SNPs which are missing in a fraction of individuals greater than maxmiss
Discard SNPs with minor allele frequency less than minmaf
Discard SNPs with minor allele frequency greater than than maxmaf
Set this to FALSE
to exclude transition SNPs
Set this to FALSE
to exclude transversion SNPs
Keep only SNPs on chromosomes 1 to 22
SNP IDs of SNPs to keep. Overrides other SNP filtering options
Maximum amount of memory to be used. If the required amount of memory exceeds maxmem
, allele frequency data will be split into blocks, and the computation will be performed separately on each block pair. This doesn't put a precise cap on the amount of memory used (it used to at some point). Set this parameter to lower values if you run out of memory while running this function. Set it to higher values if this function is too slow and you have lots of memory.
Overwrite existing files in outdir
Supply this if the prefix can refer to genotype data in different formats
and you want to choose which one to read. Should be plink
to read .bed
, .bim
, .fam
files, or eigenstrat
, or packedancestrymap
to read .geno
, .snp
, .ind
files.
Number of genotype chunks to store on disk. Setting this to a positive integer makes the function slower, but requires less memory. The default value for cols_per_chunk
in extract_afs
is 10. Lower numbers will lower the memory requirement but increase the time it takes.
Print progress updates