Featured image of post Statistics in World War II - How Many Tanks Did Germany Have?

Statistics in World War II - How Many Tanks Did Germany Have?

An interesting application of point estimation.

Introduction

By the end of 1941, most of continental Europe had fallen to Nazi Germany and the other Axis powers, and by 1942, their forces had begun their significant advance in the eastern front deep into the Soviet Union. A key component of their rapid conquests was their revolutionary use of tanks in modern warfare. While other militaries, most notably that of France, used tanks as a modern armored form of cavalry, the Germans were the first to make full use of tanks’ speed and strength.

With the Nazis utilizing tanks with such devastating results, it was essential for the Allies to stop them. A key component of the solution was figuring out how many tanks the Germans had deployed in order to allocate resources effectively, since there was a tremendous danger both in underestimating and in overestimating the enemy’s strength. The consequence of underestimating is clear, since one could suddenly be outnumbered in battle. Overestimating is also bad, since it can lead to undue caution and failure to exploit advantages, or to committing too many resources in one theater and thus not having enough elsewhere. The Allies realized that the tanks destroyed and captured during a battle had serial numbers on their gearboxes that could help with this problem. With these data, the Allies found a statistical method to estimate the numbers of the tanks.

Methods

Now we gonna focus on searching for a good estimator for the numbers of the tanks. But before we start calculation, we should first clarify our criterion, that is, what properties are “good”? An intuitive idea is to find an unbiased estimator with as small variance as possible, and that is exactly what I’d pursue here. Then we may assume some key information for a deeper analysis. Suppose that Germany had $N$ tanks in total, and that the Allies observed $k$ tanks. Let the numbers of observed tanks be $x_{(1)},x_{(2)},\dots,x_{(k)}$ and $x_{(1)}<x_{(2)}<\dots<x_{(k)}$. What’s more, each tank has an equal chance of being observed. Our goal is to find $\widehat{N}$, which is an estimator of $N$.

Method I

If we consider the numbers of observed tanks as a set of random variables $X_{(1)},X_{(2)},\dots,X_{(k)}$ ($X_{(1)}<X_{(2)}<\dots<X_{(k)}$), and $x_{(1)},x_{(2)},\dots,x_{(k)}$ as their observations, then we have $$ \Pr(X_{(k)}=x_{(k)})=\frac{x_{(k)}-1\choose k-1}{N \choose k}. $$ So $$ \mathbb{E}(X_{(k)})=\sum_{m=k}^{N}m\Pr(X_{(k)}=m)=\sum_{m=k}^{N}m\frac{m-1\choose k-1}{N \choose k} =k\sum_{m=k}^{N}\frac{m\choose k}{N \choose k}=\frac{k(N+1)}{k+1}, $$ $$ \mathbb{E}(X_{(k)}^2)=\sum_{m=k}^{N}m^2\frac{m-1\choose k-1}{N \choose k}= \sum_{m=k}^{N}(m+1)m\frac{m-1\choose k-1}{N \choose k}-\sum_{m=k}^{N}m\frac{m-1\choose k-1}{N \choose k} =\frac{k(N+1)(N+2)}{k+2}. $$ Here we use a trick, that is $\sum_{m=k}^N {m\choose k}={N+1\choose k+1}$. The fomula can be proven by induction. Rearranging the equation, we have $$ N=\left(1+\frac{1}{k}\right)\mathbb{E}(X_{(k)})-1. $$ So the point estimation of $N$ is $$ \widehat{N}=\left(1+\frac{1}{k}\right)X_{(k)}-1. $$ Let’s check some properties of the estimator, $$ \mathbb{E}(\widehat{N})=\left(1+\frac{1}{k}\right)\mathbb{E}(X_{(k)})-1=N, $$

\[ \begin{aligned} \mathrm{var}(\widehat{N})&=\left(1+\frac{1}{k}\right)^2\mathrm{var}(X_{(k)})\\ &=\left(1+\frac{1}{k}\right)^2\left(\mathbb{E}(X_{(k)}^2)-\left(\mathbb{E}(X_{(k)})\right)^2\right)\\ &=\frac{(N-k)(N+1)}{k(k+2)}. \end{aligned} \]
When $k\ll N$, the standard error of $\widehat{N}$ approximately equals to $N/k$. If the Allies observed more tanks, then the standard error will decrease rapidly. The fact is in accord with our intuition. Since we have only one observation of the random variable $X_{(k)}$, the result is satisfactory.

Method II