Chi-Squared Test

Your digital team just ran a campaign experiment. They sent the same action alert using three different email formats: a personal story from an affected community member, a data-heavy policy briefing, and a deadline-driven urgent appeal. Each format went to 200 randomly selected supporters from the same list. The personal story got 48 people to sign the petition, a 24% action rate. The policy briefing got 30, just 15%. The urgent appeal landed at 42, or 21%. The campaigner sees the personal story pulling ahead by nine percentage points and wants to use it for every future send. But the data analyst pauses. With only 200 people per group, could this variation just be the luck of the draw?

This is where cross-tabulations and hypothesis testing meet. You have a grid of counts: three email formats crossed with two outcomes (signed the petition or didn't). The cross-tab reveals the pattern. But spotting a pattern and proving it's real are two different things. The chi-squared test is the tool that bridges that gap. It works with categorical data, answering whether the distribution of outcomes across groups differs more than you'd expect from chance alone.

The logic starts with a question: if the email format truly made no difference, what would you expect the data to look like? Across all three groups combined, 120 out of 600 people took action. That's an overall rate of 20%. If format doesn't matter, each group should land near that 20% mark, with about 40 actions out of 200. Those are your expected counts, the numbers you'd predict under the null hypothesis that format and outcome are unrelated.

The chi-squared test measures how far the observed counts stray from those expected counts. For each cell in your cross-tabulation, you take the gap between what you saw and what you expected, square it so that positive and negative gaps both count equally, and divide by the expected count so that a gap of 8 matters more when you expected 40 than when you expected 400. Then you add up all those contributions across every cell. The result is a single number called the chi-squared statistic. A small value means your data looks roughly like what chance would produce. A large value means the observed pattern is more lopsided than random variation would typically create.

For the email experiment, the personal story format had 48 actions versus an expected 40. The policy briefing had 30 versus 40. The urgent appeal was close to expected at 42. When you work through the arithmetic for all six cells in the table, the chi-squared statistic comes out to about 5.3. Is that large enough to be convincing? This is where the p-value enters. For a table with three columns and two rows, the test uses two degrees of freedom, a value that reflects how many cells can vary freely once you lock in the row and column totals. With a chi-squared of 5.3 and two degrees of freedom, the p-value is about 0.07. That means if format truly didn't matter, you'd see a pattern at least this uneven roughly 7% of the time. At the conventional 0.05 threshold, this result falls short of statistical significance. The data hints at a pattern, but it's not strong enough to rule out chance.

That finding matters. Without the test, the campaigner might have committed to the personal story format based on a difference that could easily have appeared by coincidence. The test doesn't say the pattern is definitely noise. It says the evidence isn't strong enough yet to be confident it's real. With a larger sample, say 500 per group instead of 200, the same percentage differences would produce a chi-squared value roughly two and a half times bigger and push the p-value well below 0.01. More data means less room for chance to explain away the pattern.

Chi-squared tests show up throughout digital advocacy work. When your team compares petition conversion rates across three different landing page designs, chi-squared tells you whether the top performer is genuinely better or just had a lucky streak of visitors. When you cross-tabulate email engagement levels (opened, clicked, ignored) by supporter segment (new, active, lapsed), chi-squared reveals whether engagement truly differs across those groups. In campaign reporting, if you split supporters by recruitment source and compare who completed a follow-up lobbying action, chi-squared turns a suggestive cross-tabulation into evidence. It works any time your data consists of counts sorted into categories rather than measurements on a continuous scale.

The chi-squared test answers one question about a cross-tabulation: is the pattern in these categories real, or could the luck of the draw alone have produced it?


See It

Drag the top of each bar to change that group's action rate. Watch the chi-squared statistic and p-value update as the groups become more or less different from each other.


Reflect

Think about the last time your team compared results across segments, channels, or campaign variants and picked a "winner." Was the comparison tested statistically, or did the group with the highest number simply get declared the best?

When you see a cross-tabulation showing different outcomes across groups, what sample size per group would you need before you'd trust the pattern enough to change your strategy?