Outlier Detection

You're reviewing last month's online donations and the average gift jumps from €35 to €290. That can't be right. You scroll through the transactions and spot it: a single gift of €12,000. Everything else is between €10 and €200. That one number just rewrote every summary statistic in your report.

This is what an outlier looks like. It's a data point that sits far away from everything else, so far that it changes the story the numbers tell. Whether that change is informative or misleading depends entirely on why the outlier is there.

Some outliers are errors. Someone typed 10000 instead of 100. A test transaction never got deleted. A currency conversion went wrong. These need to be found and fixed, because they distort every calculation they touch. Other outliers are real but unusual. A long-time supporter made an extraordinary end-of-year gift. A corporate match doubled someone's contribution. These are genuine data points, but including them without comment makes your "typical donor" statistics meaningless. And some outliers are the most important data you have. A sudden spike in petition signatures from a new region might be the earliest signal that your campaign is catching fire somewhere unexpected.

The challenge is telling these apart. There are two common approaches, and both build on concepts you've already seen.

The first is the IQR method. Back in percentiles and quartiles, we saw that Q1 and Q3 mark the boundaries of the middle 50% of your data. The IQR is the distance between them. The IQR method flags anything below Q1 minus 1.5 times the IQR, or above Q3 plus 1.5 times the IQR. The beauty of this approach is that it's built on the median and quartiles, which means it isn't thrown off by the very outliers you're trying to detect. Even if that €12,000 gift is in your data, the quartiles barely move.

The second is the Z-score method. This one uses the mean and standard deviation to measure how far each value is from the center. A Z-score tells you how many standard deviations a point sits from the mean. A Z-score of 0 means you're right at the average. A Z-score of 2 means you're two standard deviations above it. Typically, anything beyond 2 or 3 standard deviations gets flagged. The problem is that the mean and standard deviation are themselves sensitive to outliers. That €12,000 gift inflates the mean and stretches the standard deviation, which can actually make the outlier look less extreme than it really is. For skewed data, which describes most nonprofit datasets, the IQR method is usually more reliable.

Neither method tells you what to do with an outlier once you've found one. That's a judgment call. If it's a data entry error, fix it. If it's a real but extraordinary value, consider reporting your statistics both with and without it, so your audience can see the effect. If you're building a model or running an A/B test, think about whether the outlier represents a pattern you want the model to capture or noise you want to filter out.

Outlier detection matters in grant reporting when a single program participant's exceptional outcome inflates the average improvement. It matters in campaign analytics when a viral post generates a flood of low-quality signups that skew your conversion metrics. It matters in budgeting when one large gift masks a decline in your regular giving base. And it matters in survey analysis when a respondent who answered "1" to every question drags the satisfaction score down, possibly because they were disengaged rather than genuinely dissatisfied.

An outlier is a question, not an answer. Finding one tells you something unusual happened. Your job is to figure out whether that something is a mistake to fix, a fluke to note, or a signal to follow.


See It

Click anywhere on the chart to add a donation. Drag existing dots to move them. Watch how the IQR fences and Z-score thresholds respond differently to extreme values.


Reflect

Think about the last time a single data point dramatically changed a report or metric at your organization. Was it investigated, or was it quietly included in the average? What would have changed if someone had flagged it?

When you see an unusually high or low number in your data, what's your instinct? Do you assume it's an error, accept it as real, or dig into the context? How often does your team have a process for checking?