Central Limit Theorem
Your organization tracks individual donation amounts, and by now you know they're heavily skewed. Most supporters give €10 or €15. A few give €200 or more. There's no bell curve in sight. But your data analyst does something interesting. Every week, she pulls a random sample of 40 recent donations and computes the average. After six months, she has 26 weekly averages. She plots them. And there it is, a near-perfect bell curve, centered around €34, with most weekly averages falling between €28 and €40. The individual donations look nothing like a normal distribution. The averages do. That's not a coincidence.
This is the central limit theorem, and it is arguably the single most important idea in all of statistics. It says that when you take a random sample from any distribution and compute the average, then do it again, and again, those averages will follow a normal distribution. It doesn't matter whether the underlying data is skewed, lumpy, or shaped like nothing you've seen before. As long as the sample is large enough, the averages will form a bell curve.
That's worth pausing on. The data itself can be wildly non-normal. Donation amounts, petition signature counts, email click rates, event attendance. These things are often skewed, clumpy, or bounded. But the averages of random samples drawn from that messy data will be approximately normal. The larger your sample, the tighter and more symmetrical that bell becomes. Even with a sample of 30 or 40 observations, the approximation is usually excellent. With smaller samples, it depends on how skewed the original data is. More skew requires more data before the bell shape emerges.
There are two other things the central limit theorem tells you. First, the center of the bell of averages sits at the true mean of the whole population. Your weekly samples won't all match the true average exactly, but they'll cluster around it. Second, the spread of that bell depends on the standard deviation of the original data divided by the square root of the sample size. Larger samples produce a narrower bell, meaning each individual sample average lands closer to the truth. This is why collecting more data makes your estimates more precise. It's not just common sense. It's a mathematical guarantee.
This theorem is the reason that most statistical tools work at all. Confidence intervals, hypothesis tests, p-values, and A/B testing all rely on knowing what distribution the sample average follows. The central limit theorem provides the answer: a normal distribution, regardless of what the raw data looks like. Without it, you'd need a different statistical procedure for every different data shape you encountered. With it, you can use the same tools across a wide range of situations.
In A/B testing, this is why your testing platform can give you a reliable answer after enough observations, even when individual user behavior is wildly inconsistent. The platform isn't looking at individual actions. It's comparing averages, and averages are well-behaved. In survey research, this is why a sample of a few hundred supporters can tell you something meaningful about your entire base of tens of thousands. The average of a random sample is a reliable estimator of the population average, and the central limit theorem tells you how reliable. In grant reporting, when you report the average outcome for 50 program participants, the theorem is what guarantees that your sample average is in the neighborhood of the true average for everyone who could have participated.
The central limit theorem is the bridge between messy, unpredictable individual data and the clean, predictable behavior of averages. It's the reason you can trust a well-drawn sample to tell you something true about the whole.
See It
Pick a sample size, then click "Draw 1" or "Draw 50" to sample donations and plot their averages. Try different sample sizes to see how larger samples produce a tighter bell.
Reflect
Think about a metric your organization tracks where individual values vary widely, like donation amounts, email click rates, or event attendance. If you sampled 40 values at random and averaged them, then did it again next week, how close would those averages be? Have you ever noticed that aggregate numbers are more stable than individual ones?
When your team runs an A/B test, what sample size do you typically use before drawing a conclusion? Could you be stopping too early, before the central limit theorem has had a chance to make your averages reliable?