The estimates produced by *ADMIXTOOLS 2* (and by any other
software) have high, but still limited precision, and it’s important to
know what those limits are. There are several reasons why there is
uncertainty in an estimate, but the two major ones in our case are the
random sampling of individuals, and the random sampling (or rather the
random drift) of SNPs. Quantifying this uncertainty is the same as
asking these questions: How different would the results be if we had
used DNA from different individuals (from the same populations)? And how
different would the results be if we could replay the same history,
while letting the SNPs drift randomly another time?

We can answer these questions with resampling procedures such as
jackknife and bootstrap. The original *ADMIXTOOLS* software uses
jackknife to estimate standard errors of *f*-statistic estimates
and other parameters, while *ADMIXTOOLS 2* can also compute
bootstrap standard errors. Unless otherwise mentioned, standard errors
estimate the uncertainty due to the random sampling of SNPs, although
*ADMIXTOOLS 2* also has functions that estimate the uncertainty
due to the random sampling of individuals.

Resampling SNPs, SNP blocks, or whole chromosomes is a very common
method for quantifying uncertainty across many methods used in genetics.
In the context of ancient DNA, jackknifing SNP blocks is how f-statistic
standard errors are usually computed. What is less common, is to apply
the same principle not just to determine the precision with which we can
estimate a single statistic, but to determine the precision of the whole
analysis. This is arguably at least as interesting as the confidence
interval of one statistic, but it requires much more time, because the
whole analysis has to be repeated hundreds of times on different subsets
of the data. With the performance gains and dedicated functions for
resampling SNPs in *ADMIXTOOLS 2*, it is much easier than it was
before.

The concept of estimating the uncertainty of an analysis is not only relevant for data with low coverage, where intuitive that fewer SNPs and less information lead to more uncertainty. It is just as relevant when we have perfect data for every SNP in the genome, because we still rely on the randomness of genetic drift in a finite number of SNPs across 23 chromosomes. If our goal was only to describe what we observe in this finite set of SNPs, resampling might not be necessary. But since our goal is to make statements about demographic history by using random drift in these SNPs, we should not treat them as fixed, and instead ask how much our results are influenced by that randomness.

Before we’ll go over how resampling SNPs can be used to determine the
robustness of a whole analysis, let’s look at how the concept is applied
to derive standard errors and *p*-values for single \(f_4\)-statistics.

Let’s say we want to know whether two population pairs – \(A\), \(B\) and \(C\), \(D\) form two clades with respect to each other. We can reject the null hypothesis that they do form a clade if \(f_4(A, B; C, D)\) significantly deviates from zero – if there is a significant covariance between the allele frequency differences \(A-B\) and \(C-D\).

To test whether this covariance is significant, we have to estimate
its standard error (a measure of the sampling variance). From the
estimate and its standard error, we can calculate a z-score and
*p*-value – the probability to observe this or a more extreme
\(f_4\) statistic under the null
hypothesis (which is that the two population pairs form a clade and the
true covariance is zero).

To estimate the standard error, we can apply jackknife or bootstrap resampling: computing \(f_4(A, B; C, D)\) many times on different subsets of SNPs to see how different those estimates are from each other.

The idea behind jackknife and bootstrap is that while it’s impossible to know with certainty how different a result would be if we had different data, we can make an educated guess by analyzing many different subsets of the data. In the case of jackknife, the subsets are generated by leaving out each sample at a time, and for bootstrap they are generated by repeatedly resampling with replacement until the total number of samples is reached. In our case “samples” are usually blocks of SNPs.

We then calculate whichever statistic we’re interested in on each of the subsets, resulting in \(n\) subset-statistics \(\hat\theta_{-i}\). From the variation in \(\hat\theta_{-i}\) we can estimate the standard error \(\sigma\).

For jackknife, the squared standard error estimate, \(\hat\sigma^2\), is given by

\[\hat\sigma^2 = \frac{n-1}{n} \sum_{i=1}^n (\hat\theta_J - \hat\theta_{-i})^2\]

\(\hat\theta_J\) is the estimate on the whole data set, and \(\hat\theta_J\) is the jackknife estimate, which is usually very close to the estimate on the whole data set: \(\hat\theta_J = n \hat\theta - \frac{n-1}{n} \sum_{i=1}^n \hat\theta_{-i}\)

For bootstrap, where the \(\hat\theta_{-i}\) are more different from each other as multiple values are excluded in each subset \(i\), the squared standard error estimate is just the variance across \(\hat\theta_{-i}\):

\[\hat\sigma^2 = \frac{1}{n-1} \sum_{i=1}^n (\hat\theta - \hat\theta_{-i})^2\]

We can’t actually use the approach described so far because SNPs do not drift independently of each other. While there are tens of millions of common SNPs in the human genome, the number of independent SNPs is closer to 50,000. If we don’t account for that, the standard errors will be greatly underestimated, leading to many false positive results.

The non-independence of SNPs (linkage disequilibrium) is mostly limited to SNPs in close proximity which are not separated by recombination hotspots. If we take two SNPs separated by 5 centiMorgan or more, they are usually independent of each other. So while we can’t use the standard jackknife to get standard error estimates by leaving out each SNP at a time, we can divide the genome into blocks of 5 centiMorgan and leave out each block at a time. This complicates things a bit because some blocks have more SNPs than others, and should be weighted appropriately.

The exact equations used in *ADMIXTOOLS* and *ADMIXTOOLS
2* for computing standard errors with the weighted block jackknife
are given here.
Lecture slides by Nick Patterson on the jackknife method are available
here.

For statistics for which estimates from multiple SNPs are the means
across estimates from single SNPs (such as *f*-statistics), the
equations get a lot simpler.

In that case, the jackknife estimate will be the same as the estimate on the whole data set: \(\hat\theta_J = \hat\theta\). And the squared standard error estimate simplifies to

\[\hat\sigma^2 = \frac{1}{g} \sum_{i=1}^g (\frac{n-m_i}{m_i}) (\hat\theta - \hat\theta_{-i})^2\] Here, \(g\) is the number of blocks, \(n\) is the total number of SNPs, and \(m_i\) is the number of SNPs in each block. If each SNP is its own block, \(m_i = 1\), \(g = n\), and the equation above will be equal to the non-blocked version.

The same quantity can also be expressed using only per-block statistics \(\hat\theta_{i}\), rather than leave-one-block-out statistics \(\hat\theta_{-i}\), which simplifies the computation:

\[\hat\sigma^2 = \frac{1}{g} \sum_{i=1}^g (\frac{m_i}{n-m_i}) (\hat\theta - \hat\theta_{i})^2\]

The subsampling of SNPs in blocks (specifically block jackknife) is
already used throughout all *ADMIXTOOLS* programs to compute
standard errors. Here, the `resample_snps`

functions
generalize this approach in two ways. First, they evaluate and returns
*all* model parameters for each set of resampled SNP blocks.
Second, they provide the option to do bootstrap resampling (resampling
blocks with replacement while keeping the total number of blocks fixed)
rather than jackknife resampling (leaving one block out at a time). The
advantage of bootstrap over jackknife resampling is that bootstrap
resampling is less likely to underestimate the true variability in the
presence of a few, large outliers. The disadvantage is that it is
inherently stochastic and doesn’t give the same results each time.
Bootstrapping also makes it necessary to evaluate the whole model a
large number of times, while jackknifing can be much faster, at least
when computing only simple statistics.

Note that the bootstrap and jackknife resampling distributions are on
different scales: Bootstrap standard errors are the square root of the
sampling variance, whereas jackknife standard errors are the square root
of *n times* the sampling variance.

In following example, 1000 models with resampled SNP blocks are evaluated, and a histogram of the likelihood scores is plotted.

```
snpres = qpgraph_resample_snps(example_f2_blocks, graph = example_graph, boot = 1000)
hist(snpres$score, 100)
```