We can do better than just giving cash to poor people. Here’s why that matters.

Cash transfers are an enormously valuable, and increasingly widespread, development intervention. Their value and popularity has driven a vast literature studying how various kinds of cash transfers (conditional, unconditional, cash-for-work, remittances) affect all sorts of outcomes (finances, health, education, job choice). I work in one small corner of this literature myself: Lasse Brune and I just finished a revision of our paper on how the frequency of cash payouts affects savings behavior, and we are currently studying (along with Eric Chyn) how to use that approach as an actual savings product.

After all the excitement over their potential benefits, a couple of recent results have taken a bit of the luster off of cash transfers. First, the three-year followup of the GiveDirectly evaluation in Kenya showed evidence that many effects had faded out, although asset ownership was still higher. Then came a nine-year (!!) followup of a cash grant program in Uganda, where initial gains in earnings had disappeared (but again, asset ownership remained higher).

One question raised by these results is whether we can do any better than just giving people cash. A new paper by McIntosh and Zeitlin tackles this question head-on, with careful comparisons between a sanitation-focused intervention and a cost-equivalent cash transfer. They actually tried a bunch of cash transfers in a range so that they could get the exact cost-equivalency through regression adjustment. In their study, there’s no clear rank ordering between cost-equivalent cash and the actual program; neither have big impacts, and they change different things (though providing a larger cash transfer does appear to dominate the program across all outcomes).

This is just one program, though – can any program beat cash? It turns out that the answer is yes! At MIEDC this spring, I saw Dean Karlan present results from a “Graduation” program that provided a package of interventions (training, mentoring, cash, and a savings group) in several different countries. The Uganda results, available here, show that the program significantly improved a wide range of poverty metrics, while a cost-equivalent cash transfer “did not appear to have meaningful impacts on poverty outcomes”.

This is a huge deal. The basic neoclassical model predicts that, at best, a program can never beat giving people cash, the best you can do is tie.* People know what they need and can use money to buy it. If you spend the same amount of money, you could achieve the same benefits for them if you happen to hit on exactly what they want, but if you pick anything else you would have done better to just hand them money. (This is the logic behind the annual Christmas tradition of journalists trotting out some economist to explain to the world why giving gifts is inefficient. And economists wonder why no one likes us!)

The fact that we can do better than just handing out cash to people is a rejection of that model in favor of models with multiple interlocking market failures – some of which may be psychological or “behavioral” in nature. That’s a validation of our basic understanding of why poor places stay poor. In a standard model, a simple lack of funds, or even the failure of one market, is not enough to drive a permanent poverty trap. You need multiple markets failing at once to keep people from escaping from poverty. For example, a lack of access to credit is bad, and will hurt entrepreneurs’ ability to make investments. But even without credit, they could instead save money to eventually make the same investments. A behavioral or social constraint that keeps them from saving, in contrast, can keep them from making those investments at all.

McIntosh and Zeitlin refer to Das, Do, and Ozler, who point out that “in the absence of external market imperfections, intra-household bargaining concerns, or behavioral inconsistencies, the outcomes moved by cash transfers are by definition those that maximize welfare impacts.” While their study finds that neither cash nor the program was a clear winner, the Graduation intervention package, in contrast, clearly beats an equivalent amount of cash on a whole host of metrics. We can account for this in two ways. One view is that the cash group actually was better off – people would really prefer to spend a windfall quickly than make a set of investments that pay off with longer-term gains. The other, which I ascribe to, is that there are other constraints at work here. Under this model, the cash group just couldn’t make those investments – they didn’t have the access to savings markets, or there is a missing market in training/skill development, etc.

There is an important practical implication as well. The notion of “benchmarking” development interventions by comparing them to handing out cash is growing in popularity, and it’s an important movement. Indeed, the McIntosh and Zeitlin study makes major contributions by figuring out how to do this benchmarking correctly, and by pushing the envelope on getting development agencies to think about cash as a benchmark.** But what do we do when there is no obvious way to benchmark via cash? In particular, when we are studying education interventions, who should we be thinking about making the cash transfers to? McIntosh and Zeitlin talk about a default of targeting the cash to the people targeted by the in-kind program. In many education programs, the teachers are the people targeted directly. In others, it is the school boards that are the direct recipients of an intervention. Neither group of people is really the aim of an education program: we want students to learn. And, perhaps unsurprisingly, direct cash transfers to teachers and school boards don’t do much to improve learning. You could change the targeting in this case, and give the cash to the students, or to their parents, or maybe just to their mothers – there turn out to be many possible ways of doing this.

So it’s really important that we now have an example of a program that clearly did better than a direct cash transfer. From a theoretical perspective, this is akin to Jensen and Miller’s discovery of Giffen goods in their 2008 paper about rice and wheat in China: it validates the way we have been trying to model persistent poverty. From the practical side, it raises our confidence that the other interventions we are doing are worthwhile, in contexts where benchmarking to cash is impractical, overly complicated, or simply hasn’t been tried. Perhaps we haven’t proven that teacher training is better than a cash transfer, but we do at least know that high-quality programs can be more valuable than simply handing out money.

EDIT: Ben Meiselman pointed out a typo in the original version of this post (I was missing “best” in “the best you can do is tie”), which I have corrected.

*I am ignoring spillovers onto people who don’t get the cash here, which, as Berk Ozler has pointed out, can be a big deal – and are often negative.

**Doing this remains controversial in the development sector – so much so that many of the other projects that are trying cash benchmarking are doing it in “stealth mode”.

How to quickly convert Powerpoint slides to Beamer (and indent the code nicely too)

Like most economists, I like to present my research using Beamer. This is in part for costly signaling reasons – doing my slides via TeX proves that I am smart/diligent enough to do that. But it’s also for stylistic reasons: Beamer can automatically put a little index at the top of my slides  so people know where I am going, and I like the default fonts and colors.

Moreover, Beamer forces me to obey the First Law of Slidemaking: get all those extra words off your slides. Powerpoint will happily rescale things and let you put tons of text on the screen at once. Beamer – unless you mess with it heavily – simply won’t, and so forces you to make short, parsimonious bullet points (and limit how many you use).

Not everyone is on the same page about which tool to use all the time, which in the past has occasionally meant I needed to take my coauthor’s Powerpoint slides and copy them into Beamer line-by-line. Fortunately, today I found a solution for automating that process.

StackExchange user Louis has a post where he shares VBA code that can quickly move your Powerpoint slides over to Beamer. His code is great but I wasn’t totally happy with the output so I made a couple of tweaks to simplify it a bit. You can view and download my code here; I provide it for free with no warranties, guarantees, or promises. Use at your own risk.

Here is how to use it:

  1. Convert your slides to .ppt format using “Save As”. (The code won’t work on .pptx files).
  2. Put the file in its own folder that contains nothing else. WARNING: If files with the same names as those used by the code are in this folder they will be overwritten.
  3. Download the VBA code here (use at your own risk).
  4. Open up the Macros menu in Powerpoint (You can add it via “Customize the Ribbon”. Hit “New Group” on the right and rename it “Macros”, then select “Macros” on the left and hit “Add”.)
  5. Type “ConvertToBeamer” under “Macro name”, then hit “Create”
  6. Select all the text in the window that appears and delete it. Paste the VBA code in.
  7. Save, then close the Microsoft Visual Basic for Applications window.
  8. Hit the Macros button again, select “ConvertToBeamer” and run it.
  9. There will now be a .txt file with the Beamer code for your slides in it. (It won’t compile without an appropriate header.) If your file is called “MySlides.ppt” the text file will be “MySlides.txt”
  10. You need to manually fix a few special characters, as always when importing text into TeX. Look out for $, %, carriage returns, and all types of quotation marks and apostrophes. I also found that some tables came through fine while others needed manual tweaking.

One issue I had with the output was that it didn’t have any indentations, making it hard to recognize nested bullets. Fortunately I found this page that will indent TeX code automatically.

I found this to be a huge time saver. Even with figuring it out for the first time, tweaking the code, and writing this post, it still probably saved me several hours of work. Hopefully others find this useful as well.

Simon Heß has a brand-new Stata package for randomization inference

After I shared my recent blog post about randomization inference (or RI), I got a number of requests for the Stata code I’ve used for my own RI tests. This sounded like a good idea to me, but also like a hassle for me. And my code isn’t designed to be easily used by other folks, so it would be a hassle for them as well.

Fortunately, a new Stata Journal article – and Stata package – came out the day after my post that does much better than any of my own code I could have shared. The article, by Simon H. Heß, is “Randomization inference with Stata: A guide and software”. It addresses a key problem with how economists typically handle RI currently:

Whenever researchers use randomization inference, they regularly code individual program routines, risking inconsistencies and coding mistakes.

This is a major concern. Another advantage of his new package is that the existence of a simple Stata command to do RI means that more researchers are likely to actually use it.

You can run findit ritest in Stata to get Simon’s package.

I’ve started trying out ritest with the same dataset on the literacy program, and it handles everything we need it to do quite well. Our stratified lottery and clustered sampling are taken care of by basic options for the program. We have multiple treatment arms, which ritest can handle by permuting our multi-valued Study_Arm variable and then using  Stata’s “i.” factor variable notation. We can then run a test for the difference between two different treatment effects by including “(_b[1.Study_Arm]-_b[2.Study_Arm])” in the list of expressions ritest computes. Highly recommended.

Randomization inference vs. bootstrapping for p-values

It’s a common conundrum in applied microeconomics. You ran an experiment on the universe of potential treatment schools in a given region, and you’re looking at school-level outcomes. Alternatively, you look at a policy that was idiosyncratically rolled out across US states, and you have the universe of state outcomes for your sample. What do the standard errors and p-values for my results even mean? After all, there’s no sampling error here, and the inference techniques we normally use in regression analyses are based on sampling error.

The answer is that the correct p-values to use are ones that capture uncertainty in terms of which units in your sample are assigned to the treatment group (instead of to the control group). As Athey and Imbens put it in their new handbook chapter on the econometrics of randomized experiments, “[W]e stress randomization-based inference as opposed to sampling-based inference. In randomization-based inference, uncertainty in estimates arises naturally from the random assignment of the treatments, rather than from hypothesized sampling from a large population.”

Athey and Imbens (2017) is part of an increasing push for economists to use randomization-based methods for doing causal inference. In particular, people looking at the results of field experiments are beginning to ask for p-values from randomization inference. As I have begun using this approach in my own work, and discussing it with my colleagues, I have encountered the common sentiment that “this is just bootstrapping”, or that it is extremely similar (indeed, it feels quite similar to me). While the randomization inference p-values are constructed similarly to bootstrapping-based p-values, there is a key difference that boils down to the distinction between the sampling-based and randomization-based approaches to inference:

Bootstrapped p-values are about uncertainty over the specific sample of the population you drew, while randomization inference p-values are about uncertainty over which units within your sample are assigned to the treatment.

When we bootstrap p-values, we appeal to the notion that we are working with a representative sample of the population to begin with. So we re-sample observations from our actual sample, with replacement, to simulate how sampling variation would affect our results.

In contrast, when we do randomization inference for p-values, this is based on the idea that the specific units in our sample that are treated are random. Thus there is some chance of a treatment-control difference in outcomes of any given magnitude simply based on which units are assigned to the treatment group – even if the treatment has no effect. So we re-assign “treatment” at random, to compute the probability of differences of various magnitudes under the null hypothesis that the treatment does nothing.

To be explicit about what this distinction means, below I lay out the procedure for computing p-values both ways, using my paper with Rebecca Thornton about a school-based literacy intervention in Uganda as an example data-generating process.

Randomization inference p-values

1. Randomly re-assign “treatment” in the same way that it was actually done. This was within strata of three schools (2 treatments and 1 control per cell). As we do this, the sample stays fixed.

2. Use the fake treatments to estimate our regression model:

y_{is}= \beta_0 +\beta_1 T1_s + \beta_2 T2_s + \textbf{L}^\prime_s\gamma +\eta y^{baseline}_{is} + \varepsilon_{is}

\textbf{L} are strata fixed effects.
The fake treatments have no effect (on average) by construction. There is some probability that they appear to have an effect by random chance. Our goal is to see where our point estimates lie within the distribution of “by random chance” point estimates from these simulations.

3. Store the estimates for \beta_1 and \beta_2.

4. Repeat 1000 times.

5. Look up the point estimates for our real data in the distribution of the 1000 fake treatment assignment simulations. Compute the share of the fake #s that are higher in absolute value than our point estimates. This is our randomization inference p-value.

Bootstrapped p-values

1. Randomly re-sample observations in the same way they were actually sampled. This was at the level of a school, which was our sampling unit. In every selected school we keep the original sample of kids.

This re-sampling is done with replacement, with a total sample equal to the number of schools in our actual dataset (38). Therefore almost all re-sampled datasets will have repeated copies of the same school. As we do this, the treatment status of any given school stays fixed.

2. Use the fake sample to estimate our regression model:

y_{is}= \beta_0 +\beta_1 T1_s + \beta_2 T2_s + \textbf{L}^\prime_s\gamma +\eta y^{baseline}_{is} + \varepsilon_{is}

\textbf{L} are strata fixed effects.

The treatments should in principle have the same average effect as they do in our real sample. Our goal is to see how much our point estimates vary as a result of sampling variation, using the re-sampled datasets as a simulation of the actual sampling variation in the population.

3. Store the estimates for \beta_1 and \beta_2.

4. Repeat 1000 times.

5. Compute the standard deviation of the estimates for \beta_1 and \beta_2 across the 1000 point estimates. This is our bootstrapped standard error. Use these, along with the point estimate from the real dataset, to do a two-sided t-test; the p-value from this test is our bootstrapped p-value.*

—–

I found Matthew Blackwell’s lecture notes to be a very helpful guide on how randomization inference works. Lasse Brune and Jeff Smith provided useful feedback and comments on the randomization inference algorithm, but any mistakes in this post are mine alone. If you do spot an error, please let me know so I can fix it!

EDIT: Guido Imbens shared a new version of his paper with Alberto Abadie, Susan Athey, and Jeffrey Wooldrige about the issue of what standard errors mean when your sample includes the entire population of interest (link). Reading an earlier version really helped with my own understanding of this issue, and I have often recommended it to friends who are struggling to understand why they even need standard errors for their estimates if they have all 50 states, every worker at a firm, etc.

*There are a few other methods of getting bootstrapped p-values but the spirit is the same.

Where is Africa’s Economic Growth Coming From?

I recently returned from a two-week* trip to Malawi to oversee a number of research projects, most importantly a study of savings among employees at an agricultural firm in the far south of the country. For the first time in years, however, I also took the time to visit other parts of Malawi. One spot I got back to was the country’s former capital, Zomba, where I spent an extended period in graduate school collecting data for my job market paper. This was my first time back there in over four years.

The break in time between my visits to the city made it possible to see how the city has grown and changed. I was happy to see signs of growth and improvement everywhere:

  • There are far more guest houses than I recall.
  • The prices at my favorite restaurant have gone up, and their menu has expanded by about a factor of five.
  • They finally got rid of the stupid stoplight in the middle of town. I used to complain that traffic flowed better when it was broken or the power was out; things definitely seem to work better without it. (Okay, this might not technically be economic development but it’s a huge improvement.)
  • Whole new buildings full of shops and restaurants have gone up. I was particularly blown away to see a Steers. In 2012, I could count the international fast-food chain restaurants in Malawi on one hand. This Steers is the only fast-food franchise I’ve seen outside of the Lilongwe (the seat of government) and Blantyre (the second-largest city and commercial capital).
2017-07-19 18.36.40
The new Steers in Zomba

What’s driving this evident economic growth? It’s really hard to say. Zomba is not a boomtown with growth driven by the obvious and massive surge of a major industry. Instead, it seems like everything is just a little bit better than it was before. The rate of change is so gradual that you probably wouldn’t notice it if you were watching the whole time. Here’s a graph that shows snapshots of the consumption distribution for the whole country, in 2010 (blue) and 2013 (red), from the World Bank’s Integrated Household Panel Survey:

Consumption CDFs

For most of the distribution, the red line is just barely to the right of the blue one.** That means that for a given percentile of the consumption distribution (on the y-axis) people are a tiny bit better off. It would be very easy to miss this given the myriad individual shocks and seasonal fluctuations that people in Malawi face. It’s probably an advantage for me to come back after a break of several years – it implicitly smooths out all the fluctuations and lets me see the broader trends.

These steady-but-mysterious improvements in livelihoods are characteristic of Africa as a whole. The conventional wisdom on African economic growth is that it is led by resource booms – discoveries of oil, rises in the oil price, etc. That story is wrong. Even in resource-rich countries, growth is driven as much by other sectors as by natural resources:

Nigeria is known as the largest oil exporter in Africa, but its growth in agriculture, manufacturing, and services is either close to or higher than overall growth in GDP per capita. (Diao and McMillan 2015)

Urbanization is also probably not an explanation. Using panel data to track changes in individuals’ incomes when they move to cities, Hicks et al. (2017) find that “per capita consumption gaps between non-agricultural and agricultural sectors, as well as between urban and rural areas, are also close to zero once individual fixed effects are included.”

So what could be going on? One candidate explanation is the steady diffusion of technology. Internet access is more widely available than ever in Malawi: more people have internet-enabled smartphones, and more cell towers have fiber-optic cables linked to them. While in Malawi I was buying internet access for $2.69 per gigabyte. In the US, I pay AT&T $17.68 per GB (plus phone service but I rarely use that). Unsurprisingly, perhaps, better internet leads to more jobs and better ones. Hjort and Poulsen (2017) show that when new transoceanic fiber-optic cables were installed, the countries serviced by them experienced declines in low-skilled employment and larger increases in high-skilled jobs. Other technologies are steadily diffusing into Africa as well, and presumably also leading to economic growth.

Another explanation that I find compelling is that Africa has seen steady improvements in human capital, led by massive gains in maternal and child health and the rollout of universal primary education. Convincing evidence on the benefits of these things is hard to come by, but one example comes from the long-run followup to the classic “Worms” paper. Ten years after the original randomized de-worming intervention, the authors track down the same people and find that treated kids are working 12% more hours per week and eating 5% more meals.

But the really right answer is that we just don’t know. Economics as a discipline has gotten quite good at determining the effects of causes: how does Y move when I change X? The causes of effects (“Why is Y changing?”) are fundamentally harder to study. Recent research on African economic growth has helped rule out some just-so stories – for example, it’s not just rents from mining, and even agriculture is showing increased productivity – but we still don’t have the whole picture. What we do have, however, is increasing evidence on levers that can be used to help raise incomes, such as investing in children’s health and education, or making it easier for new technologies to diffuse across the continent.

*I spent two weeks on the trip, but the travel times are long enough that that amounted to just under 11 days in the country.
**The blue line is farther to the right at the very highest percentiles, but that’s all based on a very small portion of the data and household surveys usually capture the high end of the income/consumption distribution poorly. Even if we take it literally, this graph implies benefits for many of the poor and a cost for a small share of the rich, would seems like a positive tradeoff.

Foreign exchange and false advertising

I’m currently in Johannesburg, en route to Malawi to work on one project that is close to the end of data collection, and another project still in the field (note to self: neither of these are on my “Work in Progress” page! Time to update that.) Malawi recently introduced a $75 visa fee (which you can’t pay in the local currency, amusingly) for Americans that I always forget about until I am already outside the US. So I had to change some Rand into dollars.

Changing currencies is much more complex and expensive than it looks. I went up to four different ForEx counters at JNB, and each one had huge hidden fees for my transaction – 25%, on top of their aggressive exchange rate. This has always been my experience with these counters: these fees are not on their signs, and are often hidden in the transaction amount.

I suspect that there is a behavioral-type story here: it’s quite hard to detect how bad you are getting gouged on fees and commissions given all the arithmetic you need to do just to translate currencies. Firms can exploit the difficulty of understanding these fees to extract rents from their less-sophisticated customers. Laibson and Gabaix’s 2006 QJE paper is the classic reference on this idea.

In Laibson and Gabaix, though, sophisticates can dodge the hidden fees. There is no “out” at the ForEx counter – everyone has their own secret fee schedule. But there is a side effect: everyone who walks up is getting gouged. So my solution was to wait for another person who was offered a shockingly terrible rate and beat that price. Worked like a charm, and now I can get into Malawi.

The lesson of this experience, by the way, is the opposite of what you might think: never bring US dollars with you abroad! Or, rather, bring only what you need for the fees and a small emergency fund. Changing cash abroad will get you killed on fees.

Making the Grade: The Trade-off between Efficiency and Effectiveness in Improving Student Learning

Over the past couple of months, I’ve been blogging less than usual – depriving my readers of my valuable opinions, such as why certain statistical methods are terrible. In large part this is because I’ve been working at major revisions to multiple papers, which has eaten up a large fraction of my writing energy.

Rebecca Thornton and I just finished one of those revisions, to a paper that began its life as the third chapter of my dissertation. The revised version is called “Making the Grade: The Trade-off between Efficiency and Effectiveness in Improving Student Learning“. Here is the abstract:

Relatively small changes to the inputs used in education programs can drastically change their effectiveness if there are large trade-offs between effectiveness and efficiency in the production of education. We study these trade-offs using an experimental evaluation of a literacy program in Uganda that provides teachers with professional development, classroom materials, and support. When implemented as designed, and at full cost, the program improves reading by 0.64 SDs and writing by 0.45 SDs. An adapted program with reduced costs instead yields statistically-insignificant effects on reading – and large negative effects on writing. Detailed classroom observations provide some evidence on the mechanisms driving the results, but mediation analyses show that teacher and student behavior can account for only 6 percent of the differences in effectiveness. Machine-learning results suggest that the education production function involves important nonlinearities and complementarities – which could make education programs highly sensitive to small input changes. Given the sensitivity of treatment effects to small changes in inputs, the literature on education interventions – which focuses overwhelmingly on stripped-down programs and individual inputs – could systematically underestimate the total gains from investing in schools.

The latest version of the paper is available on my website here. We have also posted the paper to SSRN (link).

Real and fake good news from the 2015-16 Malawi DHS

Malawi’s National Statistical Office and ICF International recently released a report containing the main findings from the 2015-16 Demographic and Health Survey for Malawi. The DHS is an amazing resource for researchers studying developing countries – it provides nationally representative repeated cross-section data for dozens of countries. Tons of amazing papers have come out of the DHS data. I am a particular fan of Oster (2012), which shows that the limited behavior change in response to HIV risks is explained, at least in part, by other health risks. (A shout-out to IPUMS-DHS here at the University of Minnesota, which lets researchers quickly access standardized data across countries and time).

Another great use of the DHS is to look at trend lines – what health and social indicators are getting better? Are any getting worse? There is some really good news in here for Malawi: the total fertility rate (the number of children a woman can expect to have over her lifetime) has declined significantly in the past 6 years, from 5.7 to 4.4. This is a big drop, and it comes from declines in fertility at all ages:

DHS fertility trends

You can argue about whether lower fertility is necessarily better, but it seems to go hand-in-hand with improving economic and social indicators across the board, and reducing fertility probably causes improvements in women’s empowerment and access to the labor market. Moreover, the average desired family size in Malawi is 3.7 kids, so at least in the aggregate Malawians would also see this decline as an improvement.

There’s another eye-catching headline number in the report, which would be amazingly good news – the prevalence of HIV has fallen from 10.6% to 8.8%. That’s a huge drop! As long as it didn’t happen due to high mortality among the HIV-positive population, that’s really good news. Only it didn’t actually happen. What did happen was that they changed their testing algorithm to reflect the current best practice, which uses an additional test to reduce the false-positive rate. They have the data to compute the prevalence in the old way, too. Here’s what that looks like:

DHS trends in HIV

Almost totally flat. If anything there’s a small improvement for women and a slight increase in prevalence for men, but we can’t rule out zero change. This shouldn’t be a surprise, though – people can increasingly access life-saving ARVs in Malawi, which tends to push up the prevalence of HIV because HIV-positive people stay alive.

To the credit of the people who put together the report, they never present trendlines that compare the old method to the new one. People who assemble cross-country summary stats from the DHS are likely to be misled, however. It would probably be better to mention both numbers anywhere the prevalence is mentioned, since they are both valid summaries of the data done slightly different ways – and 10.4% is the number than is more comparable to previous DHS waves.

There is a nugget of good news about HIV buried in here, as well – these findings imply that the prevalence of HIV was never quite as high as we thought it was, in Malawi or anywhere else. Taking this discrepancy as a guide, we have been overstating the number of HIV-positive people worldwide by about 15%. It is definitely a good thing that those people aren’t infected, and that’s a 15% reduction in the cost of reaching the WHO’s goal of providing ARVs to every HIV-positive person in the world.

You can’t be your own friend, and that’s a big problem for measuring peer effects

Peer effects – the impact of your friends, colleagues, and neighbors on your own behavior – are important in many areas of social science. To analyze peer effects we usually estimate equations of the form

Y_i=\alpha+\beta PeerAbility+e_i

We want to know the value of β – how much does an increase in your peers’ performance raise your own performance?

Peer effects are notoriously difficult to measure: if you see that smokers tend to be friends with other smokers, is that because smoking rubs off on your friend? Or because smokers tend to come from similar demographic groups? Even if you can show that the these selections problems don’t affect your data, you face an issue that Charles Manski called the “reflection problem”: if having a high-achieving friend raises my test scores, then my higher test scores should in turn raise his scores, and so on, so the magnitude of the peer effects is hard to pin down.

A standard way of addressing these problems is to randomly assign people to peers, and to use a measure of performance that is measured ex ante or otherwise unaffected by reflection. That fixes the problems, so we get consistent estimates of beta, right?

Wrong. We still have a subtle problem whose importance wasn’t formally raised until 2009 in a paper by Guryan, Kroft, and Notowidigdo: you can’t be your own friend, or your own peer, or your own neighbor. Suppose our setting is assigning students a study partner , and the outcome we are interested in is test scores. We want to know the impact of having a higher-ability peer (as measured by the most recent previous test score) on future test scores. The fact that you can’t be your own peer creates a mechanical negative correlation between each student’s ability and that of their assigned peer. To see why, imagine assigning the peer for the highest-ability student in the class. Any partner she is assigned to – even if we choose entirely at random from the other students – will have a lower score on the most-recent test than her. And for any student who is above-average, their assigned peer will, on average, be lower-ability than them. The reverse applies to students who are below the class average.

This is a big problem for estimating beta in the equation above. The error term ei can be broken up into a part that is driven by student ability, OwnAbilityi, and a remaining component, vi. Since OwnAbilityi is negatively correlated with PeerAbilityi, so is the overall error term. Hence, even in our random experiment, we have a classic case of omitted-variable bias. The estimated effect of your peers’ ability on your own performance is biased downward – it is an underestimate, and often a very large one.

What this means is that randomized experiments are not enough. If you randomly assign people to peers and estimate β using the equation above, you will get the wrong answer. Fortunately, there are solutions. In a new paper, Bet Caeyers and Marcel Fafchamps describe this “can’t be your own friend” problem in detail, calling it “exclusion bias”. They show that several common econometric approaches actually make the problem worse. For example, controlling for cluster fixed effects often exacerbates the bias because the clusters are often correlated with the groups used to draw the peers. They also show that 2SLS estimates of peer effects do not suffer from exclusion bias – which helps explain why 2SLS estimates of peer effects are often larger than OLS estimates.

They also show how to get unbiased estimates of peer effects for different kinds of network structure. Unfortunately there is no simple answer – the approach that works depends closely on the kind of data that you have. But the paper is a fantastic resource for anyone who wants to get consistent estimates of the effect of people’s peers on their own performance.

The quality of the data depends on people’s incentives

Two recent news stories show how sensitive social science is to issues of data quality. According to John Kennedy and Shi Yaojiang, a large share of the missing women in China actually aren’t missing at all. Instead, their parents and local officials either never registered their births or registered them late. Vincent Galoso reports that Cuba’s remarkable infant mortality rate is partly attributable to doctors re-coding deaths in the first 28 days of life as deaths in the last few weeks of gestation.

Both of these data problems affect important scientific debates. The cost-effectiveness of Cuban health care is the envy of the world and has prompted research into how they do it and discussions of how we should trade off freedom and health. China’s missing women are an even bigger issue. Amartya Sen’s original book on the topic has over 1000 citations, and there are probably dozens of lines of research studying the causes and consequences of missing women – many of whom may in fact not be missing at all.

I am not sure that either of these reports is totally correct. What I am sure about is that each of these patterns must be going on to some extent. If officials in China can hit a heavily-promoted population target by hiding births, of course some of them will do so. Likewise, if parents can avoid a fine by lying about their kids, they are going to do that. And in a patriarchal culture, registering boys and giving them the associated rights makes more sense than registering girls. The same set of incentives holds in Cuba: doctors can hit their infant mortality targets either by improving health outcomes, by preventing less-healthy fetuses from coming to term, or by making some minor changes to paperwork. It stands to reason that people will do the latter at least some of the time.

Morton Jerven points out a similar issue in his phenomenal work Poor Numbers. Macroeconomic data for Africa is based on very spotty primary sources, and the resulting public datasets have errors that are driven by various people’s incentives – even the simple incentive to avoid missing data. These errors have real consequences: there is an extensive literature that uses these datasets to estimate cross-country growth regressions, which have played an important role in policy debates.

At my first job after college, my boss, Grecia Marrufo, told me that variables are only recorded correctly if someone is getting paid to get them right. She was referring to the fact that in health insurance data, lots of stuff isn’t important for payments and so it has mistakes. There is a stronger version of this claim, though: if someone is being coerced to get data wrong, the data will be wrong. And anytime people’s incentives aren’t aligned with getting the right answers, you will get systematic mistakes. I’ve seen this myself while running surveys; due to various intrinsic and extrinsic motivations, enumerators try to finish surveys quickly and end up faking data.

I’m not sure there is anything we can do to prevent fake data from corrupting social scientific research, but I have a couple of ideas that I think would help. First, always cross-check data against other sources when you can. Second, use primary data – and understand how it was collected, by whom, and for what reason – whenever possible. Neither of these can perfectly protect us from chasing fake results down rabbit holes, but they will help a lot. In empirical microeconomics, I have seen a lot of progress on both fronts: important results are debated vigorously and challenged using other data, and more people are collecting their own data. But we still have to be vigilant, and aware of the potential data reporting biases that could be driving results we regard as well-established.