Quantifying uncertainty

The estimates produced by ADMIXTOOLS 2 (and by any other software) have high, but still limited precision, and it’s important to know what those limits are. There are several reasons why there is uncertainty in an estimate, but the two major ones in our case are the random sampling of individuals, and the random sampling (or rather the random drift) of SNPs. Quantifying this uncertainty is the same as asking these questions: How different would the results be if we had used DNA from different individuals (from the same populations)? And how different would the results be if we could replay the same history, while letting the SNPs drift randomly another time?

We can answer these questions with resampling procedures such as jackknife and bootstrap. The original ADMIXTOOLS software uses jackknife to estimate standard errors of f-statistic estimates and other parameters, while ADMIXTOOLS 2 can also compute bootstrap standard errors. Unless otherwise mentioned, standard errors estimate the uncertainty due to the random sampling of SNPs, although ADMIXTOOLS 2 also has functions that estimate the uncertainty due to the random sampling of individuals.

Resampling SNPs, SNP blocks, or whole chromosomes is a very common method for quantifying uncertainty across many methods used in genetics. In the context of ancient DNA, jackknifing SNP blocks is how f-statistic standard errors are usually computed. What is less common, is to apply the same principle not just to determine the precision with which we can estimate a single statistic, but to determine the precision of the whole analysis. This is arguably at least as interesting as the confidence interval of one statistic, but it requires much more time, because the whole analysis has to be repeated hundreds of times on different subsets of the data. With the performance gains and dedicated functions for resampling SNPs in ADMIXTOOLS 2, it is much easier than it was before.

The concept of estimating the uncertainty of an analysis is not only relevant for data with low coverage, where intuitive that fewer SNPs and less information lead to more uncertainty. It is just as relevant when we have perfect data for every SNP in the genome, because we still rely on the randomness of genetic drift in a finite number of SNPs across 23 chromosomes. If our goal was only to describe what we observe in this finite set of SNPs, resampling might not be necessary. But since our goal is to make statements about demographic history by using random drift in these SNPs, we should not treat them as fixed, and instead ask how much our results are influenced by that randomness.

Before we’ll go over how resampling SNPs can be used to determine the robustness of a whole analysis, let’s look at how the concept is applied to derive standard errors and p-values for single \(f_4\)-statistics.

Standard errors and hypothesis testing

Let’s say we want to know whether two population pairs – \(A\), \(B\) and \(C\), \(D\) form two clades with respect to each other. We can reject the null hypothesis that they do form a clade if \(f_4(A, B; C, D)\) significantly deviates from zero – if there is a significant covariance between the allele frequency differences \(A-B\) and \(C-D\).

To test whether this covariance is significant, we have to estimate its standard error (a measure of the sampling variance). From the estimate and its standard error, we can calculate a z-score and p-value – the probability to observe this or a more extreme \(f_4\) statistic under the null hypothesis (which is that the two population pairs form a clade and the true covariance is zero).

To estimate the standard error, we can apply jackknife or bootstrap resampling: computing \(f_4(A, B; C, D)\) many times on different subsets of SNPs to see how different those estimates are from each other.

Jackknife and bootstrap

The idea behind jackknife and bootstrap is that while it’s impossible to know with certainty how different a result would be if we had different data, we can make an educated guess by analyzing many different subsets of the data. In the case of jackknife, the subsets are generated by leaving out each sample at a time, and for bootstrap they are generated by repeatedly resampling with replacement until the total number of samples is reached. In our case “samples” are usually blocks of SNPs.

We then calculate whichever statistic we’re interested in on each of the subsets, resulting in \(n\) subset-statistics \(\hat\theta_{-i}\). From the variation in \(\hat\theta_{-i}\) we can estimate the standard error \(\sigma\).

For jackknife, the squared standard error estimate, \(\hat\sigma^2\), is given by

\[\hat\sigma^2 = \frac{n-1}{n} \sum_{i=1}^n (\hat\theta_J - \hat\theta_{-i})^2\]

\(\hat\theta_J\) is the estimate on the whole data set, and \(\hat\theta_J\) is the jackknife estimate, which is usually very close to the estimate on the whole data set: \(\hat\theta_J = n \hat\theta - \frac{n-1}{n} \sum_{i=1}^n \hat\theta_{-i}\)


For bootstrap, where the \(\hat\theta_{-i}\) are more different from each other as multiple values are excluded in each subset \(i\), the squared standard error estimate is just the variance across \(\hat\theta_{-i}\):

\[\hat\sigma^2 = \frac{1}{n-1} \sum_{i=1}^n (\hat\theta - \hat\theta_{-i})^2\]


Blockedd jackknife and bootstrap

We can’t actually use the approach described so far because SNPs do not drift independently of each other. While there are tens of millions of common SNPs in the human genome, the number of independent SNPs is closer to 50,000. If we don’t account for that, the standard errors will be greatly underestimated, leading to many false positive results.

The non-independence of SNPs (linkage disequilibrium) is mostly limited to SNPs in close proximity which are not separated by recombination hotspots. If we take two SNPs separated by 5 centiMorgan or more, they are usually independent of each other. So while we can’t use the standard jackknife to get standard error estimates by leaving out each SNP at a time, we can divide the genome into blocks of 5 centiMorgan and leave out each block at a time. This complicates things a bit because some blocks have more SNPs than others, and should be weighted appropriately.

The exact equations used in ADMIXTOOLS and ADMIXTOOLS 2 for computing standard errors with the weighted block jackknife are given here. Lecture slides by Nick Patterson on the jackknife method are available here.


For statistics for which estimates from multiple SNPs are the means across estimates from single SNPs (such as f-statistics), the equations get a lot simpler.

In that case, the jackknife estimate will be the same as the estimate on the whole data set: \(\hat\theta_J = \hat\theta\). And the squared standard error estimate simplifies to

\[\hat\sigma^2 = \frac{1}{g} \sum_{i=1}^g (\frac{n-m_i}{m_i}) (\hat\theta - \hat\theta_{-i})^2\] Here, \(g\) is the number of blocks, \(n\) is the total number of SNPs, and \(m_i\) is the number of SNPs in each block. If each SNP is its own block, \(m_i = 1\), \(g = n\), and the equation above will be equal to the non-blocked version.

The same quantity can also be expressed using only per-block statistics \(\hat\theta_{i}\), rather than leave-one-block-out statistics \(\hat\theta_{-i}\), which simplifies the computation:

\[\hat\sigma^2 = \frac{1}{g} \sum_{i=1}^g (\frac{m_i}{n-m_i}) (\hat\theta - \hat\theta_{i})^2\]

Resampling SNPs

The subsampling of SNPs in blocks (specifically block jackknife) is already used throughout all ADMIXTOOLS programs to compute standard errors. Here, the resample_snps functions generalize this approach in two ways. First, they evaluate and returns all model parameters for each set of resampled SNP blocks. Second, they provide the option to do bootstrap resampling (resampling blocks with replacement while keeping the total number of blocks fixed) rather than jackknife resampling (leaving one block out at a time). The advantage of bootstrap over jackknife resampling is that bootstrap resampling is less likely to underestimate the true variability in the presence of a few, large outliers. The disadvantage is that it is inherently stochastic and doesn’t give the same results each time. Bootstrapping also makes it necessary to evaluate the whole model a large number of times, while jackknifing can be much faster, at least when computing only simple statistics.

Note that the bootstrap and jackknife resampling distributions are on different scales: Bootstrap standard errors are the square root of the sampling variance, whereas jackknife standard errors are the square root of n times the sampling variance.

In following example, 1000 models with resampled SNP blocks are evaluated, and a histogram of the likelihood scores is plotted.

snpres = qpgraph_resample_snps(example_f2_blocks, graph = example_graph, boot = 1000)
hist(snpres$score, 100)