Know Your Question

Over the past five weeks, we’ve been talking about what it takes to make a credible causal claim. But here’s something worth stepping back on: causation isn’t always the question.

“How many graduates will enter the workforce next year?” is not a causal question. It’s a request for a number, and the number needs to be accurate. “Did the tuition subsidy increase graduation rates?” is a causal question. It requires a counterfactual, an identification strategy, everything we’ve discussed. “What would happen if the subsidy were redesigned?” is different again: forward-looking, but in a changing policy environment where past patterns can’t be extrapolated.

These are different tasks. They have different goals, different evaluation criteria, and different methods. Being clear about which one is on the table makes it easier to choose the right approach, and easier to explain results to the people who need them.

A lot of analytical work is descriptive: summarising patterns, tracking trends, breaking down outcomes by group. That work is valuable on its own terms and doesn’t need the machinery discussed here. This post is about the three tasks that dorequire choosing a method carefully, because they look similar on the surface but demand very different approaches.

Three types of analytical questions

Forecasting. How many students will enrol in university next year? The goal is the most accurate prediction of a future value. It doesn’t matter why enrolment is rising, only that it is rising and by how much. Time series models, machine learning, and other tools that minimise forecast error all live here. The evaluation criterion is accuracy, full stop.

The key assumption: the process generating the data stays roughly stable. The patterns learned from the past will continue into the future. When the environment shifts (a new policy, a structural reform) those patterns break, and the model has no way to recover, because it never learned why the patterns existed in the first place.

Ex post Causal inference. Did the tuition subsidy increase enrolment? This requires a counterfactual: what wouldenrolment have been without the subsidy. This is the territory of the entire causality series: potential outcomes, DAGs, identification strategies. The evaluation criterion is the credibility of the causal claim, not predictive accuracy.

The key assumption: the identification strategy is sound. The instrument is valid, the parallel trends hold, the comparison group is credible, etc. Unlike forecasting, where the test is whether the prediction was accurate, causal inference lives or dies on whether the research design is convincing. Two analysts with the same data can reach different conclusions if they rely on different identification strategies, and the disagreement isn’t about the numbers. It’s about the assumptions.

The limitation is equally important to recognise. A credible causal estimate tells you what happened, in a specific context, for a specific population. It doesn’t tell you why it happened, and it doesn’t guarantee the same result would hold in a different setting or under a different version of the policy. Ex post causal inference is local and backward-looking. That’s not a weakness; it’s a boundary worth knowing about.

Structural policy analysis. What would happen to enrolment if the subsidy were doubled? Or means-tested? Or replaced with a loan scheme? This is forward-looking, like forecasting, but the policy environment is changing. Past enrolment trends can’t be extrapolated because the rules that produced those trends are being rewritten. What’s needed is a model of why students choose to enrol, not just that they enrol at certain rates.

Structural policy analysis works precisely because it captures mechanisms (the decision-making process that generates the data) rather than the patterns the data happens to show. Change the rules, and the mechanism tells you how behaviour adjusts.

It’s reasonable to think of structural policy analysis as ex ante causal analysis: asking “what would this cause” rather than “what did this cause.” The intuition is right, but the methods are different enough that treating them as the same task is where things can go wrong.

To be clear, structural policy analysis doesn’t bypass causal analysis. It builds on it. Causal relationships still need to be identified. The difference is that a structural model uses those relationships to understand the mechanism well enough to ask what would happen under conditions that haven’t been observed.

The key assumption: the structural model correctly captures the decision-making process. The mechanism specified is the one that actually drives behaviour.

The limitation: structural models are demanding. They require theoretical commitments about how people make decisions, and if the mechanism is wrong, the predictions inherit that error. Unlike forecasting, where a bad model is revealed by poor accuracy, and causal inference, where a bad design is challenged by other researchers, a structural model can produce confident predictions from a misspecified mechanism with no obvious warning sign.

A pattern worth noticing

A common pattern in policy work: leadership asks what would happen if a program were redesigned, and the analysis that comes back measures what the current program did. The question was forward-looking, the answer was backward-looking, and nothing in the process forced anyone to notice the mismatch. Both pieces of work can be technically sound. The problem is that they’re answering different questions.

Timing helps clarify the task

One practical way to get oriented is to ask where the analysis sits relative to a policy change.

Before policy change (ex ante)After policy change (ex post)
ForecastingHow many students will enrol next year under current settings?
Causal inferenceDid the subsidy increase enrolment?
Structural policy analysisWhat would happen if we changed the income threshold?Why did the policy work, and for whom?

The blank cells are the point. Forecasting only works in a stable environment. Once the rules change, the patterns it learned no longer apply. Causal inference is inherently ex post: observed outcomes under the intervention are required. Structural policy analysis works in both columns: in the ex ante setting it earns its keep by modelling the mechanism rather than the pattern, and in the ex post setting it goes beyond estimating the effect to explaining why the policy worked and for whom.

Putting it together

Most of us move between these three tasks regularly, sometimes within the same project. The value of the taxonomy isn’t to make the work harder, it’s to make it clearer. Knowing which question is being asked, not just which method to use, helps match the analysis to the decision it’s meant to inform. Analysts need to recognise the question type, and the people requesting analysis benefit from naming what kind of answer they need.

Two Kinds of Slow

If you work with large administrative datasets or run simulation models, you’ve probably lost hours to slow code. Usually the fix is obvious in hindsight, but only once you know which kind of slow you’re dealing with.

Two kinds of computing tasks

Most analytical work falls into one of two categories.

Continue reading →

A framework for causal thinking in your work

This series was never about learning three frameworks. It was about changing how you engage with the causal claims that flow through your work every day: the ones you write, the ones you review, and the ones you use to justify decisions that affect real people and real budgets.

Theory is easy to nod along to. Application is harder. So this time: no new concepts. Instead, a decision framework you can use the next time you write, review, or commission an analysis that makes a causal claim.

Not every question requires the same approach. The right lens depends on what you’re trying to do.

Continue reading →

What do you do when there’s no experiment to run?

Over the past three weeks we’ve built a toolkit. Rubin gave us the language of potential outcomes and the counterfactual. Pearl gave us DAGs to map our assumptions and check whether our analysis can identify a causal effect. Both frameworks assume you have data — either from an experiment or from observational data with a credible identification strategy.

But what happens when you don’t? When leadership asks “what would happen if we implemented policy X?” and policy X has never been tried? There’s no treatment group. No control group. No natural experiment. No data to construct a counterfactual from.

Continue reading →

The question behind every causal claim you’ve ever made

If your regression coefficient is not a causal effect, what is a causal effect?

Here’s the answer, and it’s deceptively simple.

The causal effect of an additional year of education on a person’s earnings is the difference between what they earn with that year of education and what they would have earned without it. That’s it. Not the difference between educated and uneducated people. The difference between two states of the world for the same person, one that happened and one that didn’t.

Continue reading →

Your regression coefficient is not a causal effect

Quick thought experiment. You regress earnings on years of education and get a positive coefficient. More education → higher earnings. Done. Done?

Now add parental income as a control. The coefficient on education shrinks. Add a measure of cognitive ability. Shrinks again. Add motivation, grit, neighbourhood quality. It keeps moving.

So which coefficient is the “real” effect of education?

Continue reading →