3.3 Quasi Experiments

Sometimes you can’t run an A/B test. The intervention has already happened. Legal or ethical constraints prevent randomization. The treatment is applied at a level (store, region, time period) where randomization is impractical. In these cases, quasi-experimental methods let you estimate causal effects by exploiting structure in how the treatment was assigned.

This chapter focuses on the most widely used workhorse: Difference-in-Differences. We’ll also briefly cover two other methods (Regression Discontinuity and Synthetic Control) so you can recognize when your data supports them.

Difference-in-Differences (DiD)

Here’s a concrete scenario. A retailer rolls out a new loyalty program in 50 stores (treatment) while keeping 50 stores without it (control). You measure average basket size before and after the rollout in both groups.

	Before	After	Change
Treatment stores	$42	$48	+$6
Control stores	$40	$43	+$3
DiD estimate			+$3

The treatment stores grew by $6, but control stores also grew by $3, due to seasonality, inflation, or other trends that affect everyone. The DiD estimate is $6 - $3 = $3: the incremental effect of the loyalty program. This is Difference-in-Differences: it compares the change in outcomes over time between a treated group and an untreated group. The “double differencing” removes both time-invariant differences between groups and common time trends.

You might be thinking, “That seems almost too simple.” And in a way, it is. The elegance of DiD is that it cancels out factors you can’t measure, as long as one critical assumption holds.

The Parallel Trends Assumption

DiD is valid only if the treatment and control groups would have followed the same trend in the absence of treatment. This is untestable in the post-treatment period (we can’t observe the counterfactual), but you can check for parallel trends in the pre-treatment period. If the two groups tracked each other closely for several periods before the intervention, it’s more plausible (though not guaranteed) that they would have continued to do so.

Difference-in-Differences: treatment and control groups follow parallel trends pre-intervention; post-intervention, treatment diverges. The dashed line shows the counterfactual, and the gap is the causal effect. — Figure 1: Difference-in-Differences: parallel pre-trends, then post-intervention divergence.

How to test it:

Plot the outcome over time for both groups. Do the pre-treatment trends look parallel?
Run a formal pre-trend test: regress the outcome on group-by-time-period interactions for the pre-treatment periods and test whether the interaction coefficients are jointly zero.
If trends diverge before treatment, DiD is suspect. Consider matching, weighting, or a different method.

When to use DiD in marketing: Promotion rollouts across stores, policy changes that affect some markets but not others, new feature launches in one app version, or any setting where you have pre/post data for treated and untreated groups.

Other Quasi-Experimental Methods

Regression Discontinuity (RDD)

Regression Discontinuity exploits situations where treatment is assigned based on whether a continuous variable crosses a threshold. Units just above and just below the cutoff are nearly identical on all characteristics except their treatment status, creating a local experiment.

A loyalty program grants “Gold” status to customers who spend more than $500 per year. Gold members receive exclusive discounts. To estimate the effect of Gold status on subsequent spending, you compare customers who spent just above $500 with those who spent just below. A customer who spent $502 is essentially identical to one who spent $498; the only systematic difference is that one crossed the threshold.

Figure 2: Regression Discontinuity Design: regression lines show a jump at the threshold, with the bandwidth region highlighted.

The key requirements: no manipulation of the running variable (customers can’t game the threshold), and all variables other than the treatment should change smoothly through the cutoff. RDD estimates a local treatment effect, valid near the cutoff, but not necessarily generalizable to customers far from the threshold.

Synthetic Control

Synthetic Control constructs a counterfactual for a treated unit by creating a weighted combination of untreated units that closely matches the treated unit’s pre-treatment trajectory.

A brand launches a major campaign in California but not in other states. To estimate the campaign’s effect on sales, you construct a “synthetic California” from a weighted combination of other states that collectively match California’s pre-campaign sales trend, demographics, and market characteristics. After the campaign, the gap between actual California and its synthetic counterpart is the estimated treatment effect.

Synthetic Control: actual trajectory (solid line) vs. synthetic counterfactual (dashed line). Pre-intervention the lines overlap; post-intervention the gap represents the treatment effect. — Figure 3: Synthetic Control: actual trajectory vs. synthetic counterfactual.

Synthetic Control works best when you have a single (or few) treated units with a long pre-treatment time series. The pre-treatment fit must be tight. If the synthetic control can’t reproduce the treated unit’s pre-treatment trajectory, the estimate is unreliable.

Choosing the Right Method

No method is assumption-free. The goal is to choose the method whose assumptions are most defensible for your specific setting:

Sharp threshold that determines treatment? Consider RDD.
Clear before/after period with treated and untreated groups? Consider DiD.
Single treated unit with a long time series? Consider Synthetic Control.
None of the above? You may need matching, propensity score weighting, or instrumental variables, and you should consider redesigning the next intervention to enable a cleaner evaluation (e.g., a staggered rollout for future DiD analysis).

These methods tell us the average effect of a treatment. But what if different customers respond differently? A coupon might boost purchases by 5% on average, but that average blends customers for whom the lift was 20% with customers who saw zero effect. The next chapter tackles that question.