The world's biggest regression discontinuity design?

The public school systems in both Malawi and Uganda (the two countries where I recently spent time doing fieldwork) revolve around a set of massively-important exams that determine whether you get to move on from one level of education to another, and often eligibility for jobs as well. One of the people I was working with in Uganda described primary school there as spending seven years studying for a single test.

It’s hard to overstate the importance of these tests. Uganda’s first such exam is the Primary Leaving Examination, or PLE, which you take after Primary 7 (roughly equivalent to 6th grade as there is no kindergarten). A more-or-less universal practice in the schools I visited in Northern Uganda was to kick out all the poorly-performing P6 pupils before the beginning of P7, leaving just a core of all-stars who spend the year prepping for the test. I’m guessing this is done, in large part, to optimize how good the school looks relative to its competition.

Pressure is high on the pupils as well – it’s common for the names of top performers to be published in newspapers (and hence, by process of elimination, everyone knows who did badly as well). An op-ed I read while in Lira – which I wish I’d cut out and kept, as I can’t find it online – pointed out that this pressure has a cost, and proposed a neat experiment that could be carried out on a grand scale. It’s clear that pupils and their families enjoy the immediate fame of having their names show up in the paper, the author said, but how do they fare down the road? The author proposed that someone should follow up to see how many newspaper-famous PLE success stories end up making it through secondary school.

What’s interesting about this idea is that we could, conceivably, not only do a raw comparison, but actually isolate the causal effect of passing the PLE versus failing it (or of getting a higher grade versus a lower one). The idea is that these exams have hard score cutoffs for passing (or getting a certain grade) and if administered honestly, students can’t control their exact score. Consider a group of exam-takers who are all basically right at the cutoff. Idiosyncratic events on the day of the exam, or random errors, will push them above or below the passing mark. Hence if you look just at that group, you effectively have random assignment to the “pass, name in the paper, life of success and riches” condition or the “fail, no newspaper fame, everybody feels bad for you” condition. You can see how much passing the exam impacts, for example, your wages later in life or the number of kids you have or how many years of school you eventually finish. This approach is called a “regression discontinuity” or “RD” design, and it’s pretty hot in education research these days.

The cool thing about doing this in Malawi or Uganda is that it’s not just a particular school or program – you could study the impact of passing an exam that basically everyone in the country takes. But you’d need the exam scores, plus followup data with a random subset (or all) of the pupils in the country. I can think of ways to do this, but none of them are feasible – unless, to pick one example, some folks at UNEB and the Ugandan Census want to let me at their raw, identifiable data.

The drama of driving on terrible roads

Aine McCarthy has a beautifully-written tale of trying to free her car from a river in Tanzania, as a storm moves in:

So, we get out of the truck and start to push. This includes me, Loi (field assistant, driver, friend), two distributors with babies on back, and the family planning training facilitator, who is a nut. A handful of Witamhiya residents are standing around at the riverbank watching us, washing, watching us, watching their cattle. So we ask someone to go get village leader. They send the traditional healer. He is pretty useless. His friend, however, gets two oxen to pull the truck and one more strong man. We try to push while the oxen pull (literally attached to the tow of the front of the truck). Oxen=Tanzanian AAA? Not exactly. It still doesn’t budge and the front tire is getting deeper into the sand.

Also, there is a lot of cow poop. We are basically standing around in a warm poop-green river.  They send for two more oxen. We sit around the river talking about Obama. Four oxen pulling and eight people pushing does nothing for the truck. No cell service in Witamhiya. More sitting around and looking at the oxen. By now, it is 6pm and as if on cue, a dark cloud appears upstream. Huge and growing. The sun is setting and the awesome yet ominous wind that smell of rain starts to blow in our direction.

The whole thing is great, and will ring true to anyone who has tried to navigate crappy developing-country roads in the face of imminent rain. It joins The Economist’s classic tale of a Guiness delivery truck in Cameroon, “The Road to Hell is Unpaved”, as one of my favorite articles about infrastructure in Africa.

Advances in internet scamming

It used to be that if you ran an internet scam, the game was to lure people in with the possibility of gaining a small fortune. Now scammers are appealing to our moral compasses to make money. Case in point: I got a very confusing email recently, and following the lead of social science blogging super-hero Andrew Gelman, I’m posting a redacted version here.

Hi Jason,

My name is [name removed] and I came across nonparibus.wordpress.com after searching for people that have referenced or mentioned climate change and global warming. I am part of a team of designers and researchers that put together an infographic showing how bad climate change has gotten and how it’s contributing to the destruction of our planet. I thought you might be interested, so I wanted to reach out.

If this is the correct email and you’re interested in using our content, I’d be happy to share it with you. 🙂

Thank you,

[name removed]

I think this is some kind of meta-blogspam: get me to post their infographic, then include Amazon referral links or something to make money.

There’s almost zero chance this is legitimate or sincere. My only post that mentions climate change is one pointing out that it’s overstated as a cause of fluctuations in rainfall in Malawi and that that is probably be a bad thing. However, if the probably-a-scambot climate change activist reader who emailed me wants to explain why this isn’t bogus, I’m all ears.

Do field experiments put the method before the question?

Chris Blattman has another post – his most pointed and strongest yet – telling people to get out of field experiments because the market is crowded:

Most field experiments have the hallmarks of a bad field research project. There are four:

  1. Takes a long time. Anything requiring a panel survey a year or two apart, or a year of setup time, suffers from this problem.
  2. Risky. There are a hundred reasons why any ambitious project may fail, and many do.
  3. Expensive. This is driven by any kind of primary data collection, but especially panel or tracking surveys, and especially any Africa or conflict/post-conflict research.
  4. High exit costs. This is where experiments excel. If your historical data collection, formal theory, or secondary dataset isn’t working for you, you can put it aside. If your field experiment goes poorly, not only are you stuck with it to the bitter end, but it will take more not less time.

These are all important considerations for any research project, but I was more struck by his aside that he is “suspicious whenever someone puts the method before their question.” Do people running field experiments put the method first? I would argue that they do so less than folks who write (credible) non-experimental social science papers.

The procedure for writing a paper based on a field experiment is 1) think of something you’d like to study and 2) try to come up with an experiment that lets you study it. What about non-experimental papers? Academic lore holds that the current process for grad students writing in economics is 1) sit in a room for four years trying to think of a natural experiment* that happened somewhere and 2) write a paper about whatever that natural experiment is. This is why, for example, we know a lot about the financial returns to education for students who would drop out of school if not for rules that force them to stay until age 17, or the benefits of getting a GED for someone who barely passes the necessary exam.

Let’s take a concrete example: the price elasticity of labor supply. Should we care about the labor supply of cab drivers? Trick question – it doesn’t matter whether taxi driver labor supply is interesting! What’s important is that variations in weather mean that their effective wage changes at random, so we can study their labor supply. That’s where the light is.

In contrast, experiments let us pick our topic and then study it. For example, Jessica Goldberg ran an experiment studying the exact same issue (how labor supply responds to changes in wages) but with a representative sample of Malawians, doing the most common kind of paid work in the country (informal agricultural labor). This kind of work is also common in much of sub-Saharan Africa. Her method – a field experiment – let her pick the topic of her research, and as a result what she studied is the most important category of labor across a wide region.

I’m not saying that Camerer et al.’s cab driver paper isn’t good research, or even that Goldberg’s paper is better. My claim is much simpler, and very hard to dispute: the former paper’s topic was much more driven by its method (finding a useful instrument) than was the latter’s by its method of setting up a targeted experiment.

There are exceptions to this pattern – sometimes a government agency or an NGO has an experiment they want to run that falls into your lap, for example, and some IV-driven research is based on a targeted search. In general, however, it’s misleading to claim that the experimental method often comes before the topic.  That’s a key advantage of running experiments: an RCT lets us choose where to shine the light instead of constantly standing under streetlamps.

I suspect this isn’t what Blattman was driving at – there are topics where observational research is more appropriate (or even the only option, e.g. almost anything in international trade) and we shouldn’t stop studying them just because we can’t do RCTs on them. Nevertheless, the knee-jerk assumption that RCTs are methods-driven rather than topic-driven is pretty common, and, I think, wholly misguided.

Freakonomics Radio's Clever Bullshit

I listen to the Freakonomics Radio podcast every week, out of a vague sense of wanting to be on top of how applied microeconomics is seen in the media, and, more important, because I need something to distract my brain as I do all the menial tasks to wrap up my fieldwork here in Malawi; eventually I just run out of other podcasts that interest me. The show is usually a slightly-less-interesting version of This American Life, rather than a real show about microeconomics, which is too bad; I think the world really needs a microeconomics podcast and I wish this one were it but it isn’t, quite. What really bothers me is their pattern of failing to talk about the interesting economics of an issue – there was one short on whether selling beer at sports venues could reduce public drunkenness, which it appears to have done in some cases, but no real explanation of why that may have happened.*

I’ve never been more frustrated with how shallow their coverage is than this week, when they spent the whole episode plugging Freakonomics Experiments, a website that helps you make decisions. The idea is that if you can’t choose whether to, say, change jobs, then you go to the site, take a survey, and then it flips a coin (presumably using a computerized pseudorandom number generator) for you to tell you which option to pick. The heart of the episode is focused on the claimed psychological benefits of flipping a coin to make decisions, and in particular on how it may be preferable to have someone else do the coin flip. I’m not convinced that the Freakonomics team actually believes that claim, but even if they do it’s not at all why they are running this website. They talk very little about the real reason, and that’s unfortunate: unlike hearing people talk about how flipping coins to make choices has improved their lives, the real reason for Freakonomics Experiments is actually interesting, and actually has something to do with economics.

The truth, which everyone has already guessed from the title, is that they are running an experiment. The interesting part is why this is necessary. Take the example of changing jobs. Suppose we want to know what the effect of changing jobs is on your income ten years down the road. We’re trying to estimate b in the equation Y = bC + e, where Y is your eventual income, C is an indicator for whether you changed jobs, and e is an error term. b tells us how much your income goes up if you change jobs versus the case where you stay in the same one. We can’t just compare people who did change jobs to people who did not, because C is not assigned at random. Indeed, headhunters are much more likely to swoop in and hire away people who are going to be worth more in the future, and hence earn more irrespective of where they work. The “coin flip” solves this problem. People who use the site aren’t sure what they want to do, and the random number generator tells them what to do. Now C is assigned at random and we can really measure what happens when you change jobs.

This is all very clever. It’s even more clever than it appears: even if people don’t always obey the random number generator, it is still a valid instrumental variable for the choice of whether to change jobs. That is, we can focus on the variation in C that is driven by the Freakonomics Experiments website – to speak very imprecisely, we can look just at those who do what the “coin” says to do – and look at the effects on that group. What disappoints me is the huge missed opportunity here to talk about the difficulties of doing research on economics (and decisionmaking more broadly) and to help the show’s listeners learn about what kinds of before-and-after comparisons are untrustworthy and why. Going with the cute story, instead of talking about the real reasons for the project, does their audience a disservice.

*I can think of several – common practice at Michigan football games, for example, is to get as drunk as possible before the game so you can stay buzzed for the whole game (And by “you” I of course mean undergrads, since we responsible adults would never do such things). This is an interesting example of the substitution effect: beer at home is a very close substitute for beer at the game. Interestingly the timing and volume of consumption have a big effect on how close those two goods are, which is something that I’ve not seen elsewhere.