Sample Size Determination
Your organization is about to launch a supporter survey before a big lobbying push. The policy director wants to know what percentage of your base supports a proposed climate regulation, because the number will show up in meetings with legislators. "Just survey a few hundred people," she says. The data analyst asks the question that should come before every survey, every A/B test, every campaign evaluation. How many people do we actually need?
The answer is not a guess, and it's not "as many as possible." It's a specific number that depends on how precise you need the result to be. If you survey 100 supporters and 60% say they support the regulation, your confidence interval stretches from about 50% to 70%. That's a twenty-point range. Walking into a legislative meeting and saying "somewhere between half and nearly three-quarters of our supporters are behind this" is not particularly convincing. Survey 1,000 people and that same 60% result gives you a confidence interval of roughly 57% to 63%. Now you have something you can stand behind. Survey 10,000 and the interval tightens to 59% to 61%, but you've spent ten times the effort for a margin that was already narrow enough at 1,000.
This is the core tradeoff in sample size planning. More data means more precision, but the returns diminish sharply. Going from 100 to 400 people cuts your margin of error in half. But to halve it again, you'd need to go from 400 to 1,600. Each additional unit of precision costs quadratically more. At some point, the added precision isn't worth the cost in time, money, or supporter goodwill.
For surveys, the calculation works backward from your desired margin of error. You decide how wide a confidence interval you can tolerate, then solve for the number of people that gets you there. A margin of error of plus or minus 3 percentage points at 95% confidence requires roughly 1,067 respondents when the true proportion is near 50%. If the proportion is closer to 10% or 90%, you need fewer people because there's less variability in the responses. The worst case, meaning the largest required sample, is always at 50%, where opinions are maximally split.
For A/B tests, the logic is different. You're not trying to estimate a single number with precision. You're trying to detect a difference between two versions. This is where statistical power takes over. You pick the smallest improvement that would matter to your organization, sometimes called the minimum detectable effect. Then you calculate how many people per group you need for an 80% chance of catching that difference if it's real. A petition page with a 5% baseline conversion rate needs about 3,600 visitors per variant to reliably detect a 1-percentage-point improvement. If you're hoping to detect a 0.5-point improvement, that number jumps to roughly 14,000 per variant.
These calculations force a useful conversation before the test begins. When your email team wants to A/B test two subject lines on a 2,000-person list, the sample size calculation tells you whether you can detect anything meaningful with 1,000 per group. Often, the answer is that you can only detect large effects of 3 or more percentage points. That's fine if you're testing radically different approaches. It's a waste if you're testing minor wording tweaks that might produce a half-point difference. Knowing this up front saves you from running a test that was doomed to say "no significant difference" from the start.
In grant-funded program evaluations, sample size planning can make or break a project. Funders increasingly want evidence that a program worked, and an underpowered evaluation will fail to demonstrate impact even when the program genuinely made a difference. Calculating the required sample size during the grant application stage lets you budget for adequate data collection. In supporter surveys, running the calculation before fielding the survey tells you whether your email list is large enough to produce a meaningful result, or whether you need to partner with other organizations to pool respondents. In digital ad testing, knowing the required traffic per variant helps you decide whether to test three ad variations or just two, since splitting the same traffic across more variants means each one gets fewer impressions and the test takes longer to reach significance.
The best time to calculate your sample size is before you collect a single data point. It's the only way to know whether your question is answerable with the resources you have.
See It
Drag the margin of error slider to see how many survey respondents you need. Adjust the expected proportion to see how it affects the required sample size. Watch how precision gets expensive fast.
Reflect
Think about the last survey or A/B test your organization ran. Did anyone calculate the required sample size before launching, or did the team just use whoever was available? If you had run the calculation first, would it have changed the scope of the project or the question you asked?
When your team is debating whether to run a "quick test" on a small segment, what would change if you framed the conversation around whether the test is large enough to detect the smallest difference worth acting on?