Bayes' Theorem
Your organization just paid for a wealth screening tool. It scanned your 10,000-donor database and flagged 1,800 as "high-capacity" prospects who could give €1,000 or more. Your major gifts officer is thrilled. That's a lot of potential. She starts scheduling calls. After working through the first 100 names, she's frustrated. Only about 25 turned into actual major gift conversations. The rest were false alarms. The screening tool was supposed to be 90% accurate. So what went wrong?
Nothing went wrong with the tool. The problem is a fundamental feature of probability that catches nearly everyone off guard. It has a name: Bayes' theorem.
Here's the logic. The screening tool is genuinely good at its job. When it looks at a donor who truly is high-capacity, it correctly identifies them about 90% of the time. That's called sensitivity. And when it looks at a donor who isn't high-capacity, it correctly leaves them alone about 85% of the time. That's called specificity. Those are solid numbers. But the crucial piece of information is how many donors in your database are actually high-capacity in the first place. If only 5% of your 10,000 donors could realistically give €1,000 or more, that's 500 people out of 9,500 who can't. The tool correctly flags 90% of the 500 real prospects, giving you 450 true hits. But it also incorrectly flags 15% of the 9,500 non-prospects, adding 1,425 false alarms. Your flagged list of roughly 1,875 names contains 450 real prospects buried among 1,425 dead ends. That's a 24% hit rate, not the 90% you were expecting.
This is what Bayes' theorem describes. Your initial belief about how common something is (the prior probability, sometimes called the base rate) gets updated by new evidence (the screening result) to produce a revised belief (the posterior probability). The theorem reveals that when the thing you're looking for is rare, even a very accurate test will produce a flood of false positives. The rarer the condition, the worse the ratio gets.
The confusion happens because people mistake "the test is 90% accurate" for "a positive result is 90% likely to be correct." These are completely different statements. The first is about the test's ability to detect true cases. The second is about what a positive result actually means in your specific population, and that depends entirely on the base rate. Think about it this way. If only 1 in 100 donors is truly high-capacity, the tool has to screen 99 non-prospects for every real one. Even with 85% specificity, 15 of those 99 will be false alarms, swamping the signal.
This shows up everywhere in nonprofit work. In program evaluation, if you screen applicants for eligibility using an assessment tool, the reliability of the screening depends on what fraction of applicants are truly eligible. If most are, your results are trustworthy. If very few are, most "eligible" flags will be wrong. In petition campaigns, if you run a signature verification process that's 95% accurate but only 2% of signatures are fraudulent, the vast majority of signatures flagged as fake will actually be real. In fraud detection for online donations, most transactions are legitimate, so even a good fraud detector will flag many legitimate gifts.
The practical lesson is to always ask about the base rate before trusting a screening result. When someone tells you a tool is "90% accurate," the first question should be: accurate at what, and how common is that thing in my data? If the condition is rare, plan for a high false positive rate, build in a second-stage verification process, or treat the flagged list as a starting point for investigation rather than a final answer.
A positive result doesn't mean what you think it means. The rarer the thing you're testing for, the more likely a positive flag is wrong. Always ask how common the condition is before trusting the test.
See It
Drag the slider to change what percentage of donors are truly high-capacity. Watch how the false alarm rate explodes as the base rate drops.
Reflect
Think about the screening tools or assessments your organization uses. Whether it's a wealth screener for donors, an eligibility assessment for program participants, or a spam filter for your email list, do you know the base rate of what you're screening for? How does that base rate affect how much you should trust the results?
When someone tells you a tool or process is "X% accurate," do you ask "accurate at what?" What would change if you started asking that question?