Why You Can't Trust 'We Tried It'

Randomized Controlled Trials

Last month, your team replaced the old share button on your petition thank-you pages with a new "tell three friends" widget. Two weeks in, the numbers came back. Shares per signer were up 41% compared to the same window before the switch. The growth lead is ready to declare the new widget a winner. But there's a small detail buried in the calendar. The week you flipped the switch was also the week your organization launched a new campaign that took off on social media in a way nothing had in months. Suddenly your petitions were being signed by tens of thousands of fresh supporters drawn in by the campaign, the kind of supporters who would have shared anything you put in front of them. You have no idea whether the lift came from the widget itself or from the fact that the new audience was simply more inclined to share than the old one.

This is the kind of trap that randomized controlled trials are designed to escape.

A randomized controlled trial, often shortened to RCT, splits your audience into two or more groups using a random process and then gives each group a different version of whatever you're testing. The group that gets the change is called the treatment group. The group that gets the existing version is called the control group. Because the assignment is random, the two groups will be roughly equivalent on every characteristic you can measure (like prior engagement, location, or how recently they joined the list) and, more importantly, on every characteristic you can't measure (like how strongly they care about the issue, whether they tend to share things online, or whether they're having a good day). When you compare the outcomes between the two groups, any difference can be credited to the change itself, not to the underlying differences between the groups.

The word "controlled" in RCT does a lot of work. The control group is the bedrock of the whole method. Without it, you can only say "this group did X." With it, you can say "this group did X, and a comparable group that didn't get the intervention did Y, so the intervention's effect is X minus Y." The control group tells you the counterfactual, which is what would have happened without the change. Every claim about a campaign tactic working depends on having a good answer to the counterfactual question. Without random assignment, your best guess at the counterfactual is just whatever group you chose to compare against, and that group is almost always different from the treatment group in ways that taint the comparison.

The randomness isn't decorative. It's the engine that makes the inference work. Suppose you let supporters self-select into either getting the new widget or the old version. The supporters who chose the new one would probably be the more engaged, more curious ones, the kind who like trying new features. Their higher share rate would be partly the widget and partly the kind of person who chose it. Suppose instead you give the widget only to supporters who signed up after a certain date. Those supporters might have come from a different recruitment channel, in a different political moment, with different baseline behaviors. Only random assignment guarantees that the two groups are genuinely comparable, which is what lets you cleanly read off the effect rather than the effect tangled up with whatever made the groups different in the first place.

Most digital A/B tests are RCTs in miniature. When you split your email list at random and send half a new subject line and half the existing one, you're running an RCT. When you assign incoming visitors at random to one of two donation page layouts, you're running an RCT. When you randomly choose which signers see the share widget on the thank-you page and which see the plain confirmation, you're running an RCT. The mechanics are the same as the medical trials that test whether a drug works. You define a population, randomly assign treatment, measure an outcome, and compare. The only difference is that your outcome is signatures, donations, or shares rather than survival rates.

RCTs matter beyond the obvious A/B test. When your fundraising team wonders whether sending a personal thank-you note from the campaigner increases recurring donor retention, the only convincing answer comes from randomly assigning some monthly donors to receive the note and leaving the rest as a comparison group. When the digital team thinks a new welcome series might lift conversion to recurring giving, randomly assigning incoming subscribers to the old or new series gives you a real answer. When the membership team wants to know whether a peer-to-peer fundraising prompt drives more upgrades than the standard ask, randomly assigning donors to one prompt or the other is the only way to find out. Each of these tests still depends on the hypothesis-testing framework you'd use anywhere else, and each one needs enough people in each arm to actually detect the difference you care about.

Randomization is what turns "we tried it and got better numbers" into "we know the change caused the lift." Without it, every comparison is haunted by the possibility that the groups were different to begin with.

See It

Click an assignment method to choose who gets the new share widget. The data has a built-in true lift of 5 percentage points. Compare that number to the apparent lift each method produces, and click "Random" again to reshuffle.

Reflect

Think about the last time your team rolled out a change to part of your audience and called it a success based on the numbers that came back. Was the part of your audience that got the change comparable to the part that didn't, or were they different in ways that might explain the lift on their own? If randomizing wasn't possible, what other group would have been a fair comparison?

When your fundraising or campaigning team decides which supporters or chapters get a new tactic first, who makes that call and on what basis? If the answer is anything other than a coin flip or a random draw, what does that mean for any conclusion about whether the tactic worked?

Randomized Controlled Trials

See It

Reflect

Get new posts by email

Keep Reading

Is the Pattern Real?

The More You Look, the More You'll Find