Ceteris Non Paribus

Ceteris Non Paribus is my personal blog, formerly hosted at nonparibus.wordpress.com and now found here. This blog is a place for me to put the ideas I have, and the stuff I come across, that I’ve managed to convince myself other people would be interested in seeing. See the About page for more on the reasons why I maintain a blog and the origin of the blog’s name.

My most recent posts can be found below, and a list of my most popular posts (based on recent views) is on the right.

Ceteris Non Paribus

Story on NPR about my paper, “Scared Straight or Scared to Death?”

I was recently interviewed by NPR’s Shankar Vedantam (the host of Hidden Brain) for a story about my paper “Scared Straight or Scared to Death? The Effect of Risk Beliefs on Risky Behaviors”. The story ran today on Morning Edition, and you can find it online here:

How Risk Affects The Way People Think About Their Health

The story does a nice job of overviewing the key finding in my paper, which is that overstating the risks of an activity too much can backfire, causing people to give up and stop trying to protect themselves. This is the opposite of the usual pattern we observe.

It glosses over an important detail about the context and my empirical findings, which is that the fatalism result holds for some people, not everyone. Anthropologists have observed rationally fatalistic reasoning among some men in Malawi, not all of them. In my sample, I find fatalistic responses for 14% of people – the ones with the highest risk beliefs. I also find that those people are less likely to think they already have (or will inevitably contract) HIV, and that they are at higher risk for contracting and spreading HIV than the rest of the population.

I assume that NPR simply shortened the story for time, and I do think that their takeaway is the right one – we should be cautious when trying to scare people into better behavior by playing up how risky certain activities are.

Kerwin_Mpyupyu
Mpyupyu Hill in Zomba District, Southern Malawi – I collected the data for the paper nearby

You can find the latest version of the paper on my website or on SSRN. Here’s the abstract:

This paper tests a model in which risk compensation can be “fatalistic”: higher risks lead to more risk-taking, rather than less. If exposure cannot be perfectly controlled, fatalism arises when risks become sufficiently high. To test the model I randomize the provision of information about HIV transmission rates in Malawi, and use a novel method to decompose the risk elasticity of sexual risk-taking by people’s initial risk beliefs. Matching the model’s predictions, this elasticity varies from -2.3 for the lowest initial beliefs to 2.9 for the highest beliefs. Fatalistic people, who have a positive elasticity, comprise 14% of the population.

For more details about the paper, see my previous posts about it on this blog (first post second post) or on the World Bank’s Development Impact blog (link).

How are wages set for teachers, in theory?

A recent post by Don Boudreaux on the relative wages of actors and teachers has been doing the rounds in the economics blogosphere, garnering favorable mentions by Alex Tabarrok and Ranil Dissanayake. Boudreaux asserts that:

The lower pay of fire fighters and school teachers simply reflects the happy reality that we’re blessed with a much larger supply of superb first-responders and educators than we are of superb jocks and thespians.

There are lots of reasons why superstar actors and actresses earn tons of money. There really is a limited supply of high-end talent there, and that really does, in all likelihood, drive the high wages we observe. And Boudreaux is also right that it is good that the average firefighter or teacher isn’t paid millions of dollars, or we’d never be able to afford enough to fight all our fires and teach all our kids.

What Boudreaux gets wrong, however, is his assumption that straightforward supply and demand can possibly explain the pay earned by teachers. He’s also wrong when he asserts that the lack of high pay for awesome teachers is a good thing.

In the standard microeconomic theory of the labor market, workers earn hourly* wages equal to their marginal revenue product of labor. This is the revenue generated for the firm by the worker’s last hour of work. A worker’s wage is pinned down at their marginal revenue product (MRP) by a no-arbitrage condition. Strictly speaking, workers are assumed to be paid the value of their outside option to their current job.** In a competitive labor market, a worker that paid less than her MRP will be stolen away by another firm who is willing to pay her slightly more, and this repeats until wages reach MRP. A worker that is paid more than her MRP is losing money for her firm and will be fired (or the contract will never be offered).

You can see the problem for setting the wages of teachers. Public-school teachers don’t directly generate any revenue for doing their job, and public schools are not businesses and aren’t trying to maximize profits. Even private-school teachers don’t actually generate more income for their schools by putting in additional hours or being more effective in the classroom.

So how are these wages set? I don’t know the reality, but what the theory actually says is that they get whatever the best competing job is willing to pay them.*** This means the effective floor is not set by the marginal revenue product of labor, or any measure of productivity or effectiveness, but by some alternative job to teaching.

You could imagine competitors paying teachers wages that are based on their classroom effectiveness. Replacing a terrible teacher with an average one is worth $250,000 per classroom just in terms of additional lifetime earnings (Chetty, Friedman, and Rockoff 2014). And great teachers add value in many other ways that don’t directly appear in earnings. So this could lead to parents strongly preferring excellent teachers and paying a premium to private schools to get them. This would bid up wages for great teachers and we’d see very high pay for them.

Rating teachers on classroom effectiveness is hard, though. The Chetty, Friedman, and Rockoff results I linked to are very controversial. There are definitely aspects of teacher ability that cannot be measured through test scores alone. Kane and Staiger (2012) show that expert observations of teachers in the classroom have incremental predictive power for future student performance, on top of Chetty-style value-added measures. Some teachers oppose any use of test scores in evaluating teacher ability.

So back to the question. Is the low pay (and low variance of pay) for teachers a sign that the world is full of awesome teachers? No; they are quite rare. Rather, we have tons of people who could be adequate teachers, some of whom are amazing teachers — and we have little ability to distinguish between them for the purposes of pay, at least in a way that people can agree upon.

And this has consequences. Our limited ability to tie teacher pay to teacher quality means that there are probably lots of potentially-great teachers in other professions. Our limited ability to even measure quality teaching in an agreed-upon way means it’s tough to incentivize improvements in teaching quality among existing teachers – or even for teachers who are motivated by non-pay considerations to know how they should improve. The low and flat pay for teachers is not a blessing. It is a problem that needs to be fixed.

*I picked hourly wages arbitrarily – daily or weekly or annual wages would work just as well, because we use calculus for these models and assume time and effort at work can be varied continuously.
** We assume this is a job, but it could be the value they put on their free time instead. Workers whose labor generates little revenue are said to “choose” “leisure” over work, rather than driven into unemployment. Again, this is a model – not the truth, and not what I believe to be correct.
***This sets a floor on wages. The ceiling imposed by the typical model of wage-setting doesn’t function here because there is no marginal revenue to compare to the marginal cost of employing the teacher. However, pressure to control public budgets probably keeps wages reasonably near their floor.

The unbanked don’t *want* to be banked

Bank accounts as currently offered appear unappealing to the majority of individuals in our three samples of unbanked, rural households – even when these accounts are completely subsidized.

That’s the punchline of a new paper by Dupas, Karlan, Robinson, and Ubfal, “Banking the Unbanked? Evidence from three countries”. In both developing countries and the rich world there is a lot of justified concern about the unbanked population – households with no formal financial accounts, and a popular theory that if we could just improve the accessibility of bank accounts this would be a game-changer for people. Unfortunately, like many other silver bullets before it, this one has failed to kill the stalking werewolf of poverty.

Indeed, it almost doesn’t leave the barrel of the gun. 60% of the treatment group in Malawi and Uganda (and 94% in Chile) never touch the bank accounts. The following depressing graph is the CDF of deposits for people who opened an account:

Deposits

In Malawi, even among account openers, 80% of people had less money in total deposits than the account would have charged in fees (the fees were covered as part of the study). Just a tiny fraction of people had enough in deposits for the interest to cover the fees – which is the minimum for the account to compete with cash on the nominal interest rate.*

The goal of the study was to look at how accounts impact downstream outcomes like incomes and investments, and the paper dutifully reports a host of null effects on those variables. But the authors instead focus on why uptake was so low. The most popular theory among the treatment group was poverty – that people just don’t have enough money to save – and this lines up with some regression analysis results as well. But confusion also appears to be a factor: 15% of treatment-group households in Malawi say they didn’t use the account because they couldn’t meet the minimum balance, even though there was no minimum balance requirement.

Another issue with the poverty model of low savings is that it contrasts with my mental model of income dynamics in Malawi. The overwhelmingly dominant staple in Malawi is maize, and it is harvested more or less all at once in May, generating a huge burst of (in-kind) income that must be smoothed over the rest of the year. Maize is hard to store and storage has significant scale economies, so people often sell off a lot of their harvest and buy maize later with the cash – effectively “renting” storage. This requires lots of cash to be saved.

I can’t quickly put any numbers on this mental model (although maybe I’ll play around with the LSMS to see what it shows). I am quite confident about the spike in income at the harvest, though – the question is how much people need to save in cash. I’d like to see more discussion of this in future work about financial access in Africa. Dupas et al. may not have the data to do it, but future research projects should definitely collect it.

*This assumes away the problem of cash being lost, stolen, or wasted, which would make the nominal interest rate negative.

New website (and a new home for my blog)

I recently changed my personal website over to WordPress. (I had previously been writing the HTML by hand, which was a pain when I made major updates.) An added benefit of that changeover is that it enabled me to start hosting my blog, Ceteris Non Paribus, on my own server, instead of housing it at WordPress.com.

This transition should look more or less seamless from the perspective of existing blog readers. My old blog, nonparibus.wordpress.com, now redirects to its new home on my website, jasonkerwin.com/nonparibus, and all the old posts are now stored here along with their comments.  All my new posts – including this one –  should automatically cross-post to the old blog, and therefore should show up for existing subscriptions through RSS, email, etc. To avoid missing posts in the future, however, you might want to update your subscription using the email or RSS subscription widgets on the right, or by using this direct link to the new RSS feed.

Scared Straight or Scared to Death? The Effect of Risk Beliefs on Risky Behaviors

Longtime readers of this blog (or anyone who has talked to me in the past few years) know that I have been working on a paper on risk compensation and HIV. Risk compensation typically means that when an activity becomes more dangerous, people do less of it. If the risk becomes sufficiently high, however, the rational response can be to take more risks instead of fewer, a pattern called “rational fatalism”. This happens because increased risks affect not only the danger of additional acts, but also the chance that you have already contracted HIV based on past exposures. While HIV testing appears to mitigate this problem, by resolving people’s HIV status, a similar logic applies for unavoidable exposures in the future; HIV testing cannot do anything about the sense that you will definitely contract the virus in the future. I test this model used a randomized field experiment in Malawi, and show that rational fatalism is a real phenomenon.

The paper is called “Scared Straight or Scared to Death? The Effect of Risk Beliefs on Risky Behaviors”. I’ve just completed a major overhaul of the paper – here’s a link to the revised version, which is now available on SSRN, and here is the abstract:

This paper tests a model in which risk compensation can be “fatalistic”: higher risks lead to more risk-taking, rather than less. If exposure cannot be perfectly controlled, fatalism arises when risks become sufficiently high. To test the model I randomize the provision of information about HIV transmission rates in Malawi, and use a novel method to decompose the risk elasticity of sexual risk-taking by people’s initial risk beliefs. Matching the model’s predictions, this elasticity varies from -2.3 for the lowest initial beliefs to 2.9 for the highest beliefs. Fatalistic people, who have a positive elasticity, comprise 14% of the population.

I’ve put the up on SSRN to try to get it into the hands of people who haven’t seen it yet, and also because I’m making the final edits to the paper ahead of submitting it to an academic journal. Therefore, feedback and suggestions are warmly welcomed.

I’ve written about previous versions of the paper both on this blog and on the World Bank’s Development Impact Blog.

Randomized evaluations of market goods

Last weekend I was lucky enough to attend IPA‘s second annual Researcher Gathering on Advancing Financial Inclusion at Yale University. My friend and collaborator Lasse Brune presented the proposed research design for a study we are planning with Eric Chyn on using deferred wages as a savings technology in Malawi. Our project expands builds on an earlier paper by myself and Lasse that shows that demand for deferred wages is quite high and that there are potentially-large benefits.

The conference featured lots of great talks, but I particularly liked the one Emily Breza gave. She was presenting early results from a paper called “Measuring the Average Impacts of Credit: Evidence from the Indian Microfinance Crisis” that she is writing with Cynthia Kinnan. (Here is a link to an older version of the paper.) One of the major results of the RCT revolution in development economics is a robust literature showing that microcredit – small loans targeted at the very poor, at below-market interest rates – has very limited economic benefits. This was a fairly surprising result, and at odds with both the priors of microfinance practitioners and previous non-experimental research.

Randomized evaluations of microcredit generally follow the following format. You find an area where microcredit doesn’t exist, and get a firm to expand to that area. But instead of providing it to everyone in the area, you convince the firm to randomly select people to offer the product to.

Breza and Kinnan turn that logic on its head. They instead look at markets that were already actively served by microcredit, where the supply of credit was drastically reduced. This reduction happened because of the 2010 Andhra Pradesh (AP) Microfinance Crisis, where a wave of suicides by indebted people caused the AP state government to ban microcredit in the state. Breza and Kinnan don’t study AP in particular; instead, they exploit the fact that some (but not all) microcredit suppliers had exposure AP through outstanding loans there, which defaulted en masse. This reduced their liquidity sharply and forced them to cut back on lending. So if you happened to live in an area largely served by microcredit firms that were exposed to AP, you suffered a large decline in the availability of microloans.

This clever research design yields effects on economic outcomes that are much larger than those estimated in traditional RCTs.* Is that surprising? I don’t think so, because we are studying market goods – those provided through reasonably well-functioning markets, as opposed to things like primary education in the developing world where markets are limited or non-existent.** Any randomized evaluation of a market good necessarily targets consumers who are not already served by the market.

In an RCT we can get the estimated effect of receiving a good by randomizing offers of the good and using the offer as an instrument for takeup. It’s well-known that these local average treatment effects are specific to the set of “compliers” in your study – the people induced to receive the good by the randomized offer. But usually the nature of these compliers is somewhat nebulous. In Angrist and Evans’ study of childbearing and labor supply, they are the people who are induced to have a third kid because their first two kids are the same sex (and people tend to want one of each). Are those people similar to everyone else? It’s hard to say.

In the context of market goods, however, the compliers have a specific and clear economic definition. They are the consumers who firms find it unprofitable to serve. Here is a simplistic illustration of this point:

Priced Out

These are the subset of all consumers with the lowest willingness-to-pay for the good – so we know that they experience the lowest aggregate benefits from it.*** RCTs can only tell us about their benefits, which are plausibly a lower bound on the true benefits across the population. To learn about the actual average treatment effects, in this context, we need a paper like Breza and Kinnan’s.

So were the randomized evaluations of microcredit even worth doing at all? Absolutely. They tell us about what might happen if we expand microcredit to more people, which is policy-relevant for two reasons. First, the priced-out people are the next group we might think about bringing it to. Second, we have a decent sense that markets are not going to serve them, so it’s up to governments and NGOs to decide do so. That decision requires understanding the incremental benefits and costs of expanding microcredit, as compared with all other potential policies.

A broader conclusion is that any study of a good or product that might be provided by markets needs to confront the question of why it is not being provided already – or, if it is, why it is worth studying the benefits of the product for the subset of people whom markets have deemed unprofitable.

*An exception is another paper by Breza and Kinnan, along with Banerjee and Duflo, which finds some long-run benefits of random microcredit offers for people who are prone to entrepreneurship.
**This is distinct from the concept of public vs. private goods. Education is not a public good per se as it is both excludable and rival, but it is typically provided by the state and hence standard profit motives need not apply.
***Leaving aside Chetty’s point about experienced utility vs. decision utility.

 

Don't control for outcome variables

If you want to study the effect of a variable x on an outcome y, there are two broad strategies. One is to run a randomized experiment or one of its close cousins like a regression discontinuity or a difference-in-differences. The other is to adjust for observable differences in the data that are related to x and y – a list of variables that I’ll denote as W. For example, if you want to estimate the effect of education on wages, you typically want include gender in Z (among many other things). Control for enough observable characteristics and you can sometimes claim that you have isolated the causal effect of x on y – you’ve distilled the causation out of the correlation.

This approach has led to no end of problems in data analysis, especially in social science. It relies on an assumption that many researchers seem to ignore, which is that there are no other factors that we do not include in Z that are related to both y and x.* That’s an assumption that is often violated.

This post is motivated by another problem that I see all too often in empirical work. People seem to have little idea how to select variables for inclusion in Z, and, critically, don’t understand what not to include in Z. A key point in knowing what not to control for is the maxim in the title of this post:

Don’t control for outcome variables.

For example, if you want to know how a student’s grades are affected by their parents’ spending on their college education, you might control for race, high school GPA, age, and gender. What you certainly shouldn’t control for is student employment, which is a direct result of parental financial support.** Unfortunately, a prominent study does exactly that in most of its analyses (and has not, to my knowledge, been corrected or retracted).

Why is it bad to control for variables that are affected by the x you are studying? It leads to biased coefficient estimates – i.e. you get the wrong answer. There is a formal proof of this point in this 2005 paper by Wooldridge. But it’s easy to see the problem using a quick “proof-by-Stata”. *** I’m going to simulate fake data and show that including the outcome variables as controls leads to very wrong answers.

Here is the code to build the fake dataset:

clear all

set obs 1000
set seed 346787

gen x = 2*runiform()
gen e = rnormal()
gen u = rnormal()

gen y = x + e
gen other_outcome = x^2 + ln(x) + u
gen codetermined_outcome = x^2 + ln(x) + e +u

A big advantage here is that I know exactly how x affects y: the correct coefficient is 1. With real data, we can always argue about what the true answer is.

A simple regression of y on x gives us the right answer:

reg y x
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |    1.08877   .0564726    19.28   0.000     .9779512    1.199588
       _cons |   -.096074   .0645102    -1.49   0.137    -.2226653    .0305172
------------------------------------------------------------------------------

If we control for a codetermined outcome variable then our answer is way off:

reg y x codetermined_outcome
-----------------------------------------------------------------------------------
                y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
                x |  -.6710671   .0712946    -9.41   0.000    -.8109717   -.5311624
codetermined_ou~e |   .4758925   .0157956    30.13   0.000     .4448961    .5068889
            _cons |   1.192013   .0633118    18.83   0.000     1.067773    1.316252
-----------------------------------------------------------------------------------

Controlling for the other outcome variable doesn’t bias our point estimate, but it widens the confidence interval:

reg y x other_outcome

-------------------------------------------------------------------------------
            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
            x |   1.084171   .1226962     8.84   0.000     .8433988    1.324944
other_outcome |   .0012741   .0301765     0.04   0.966    -.0579426    .0604907
        _cons |  -.0927479   .1018421    -0.91   0.363    -.2925974    .1071017
-------------------------------------------------------------------------------

Both of these controls cause problems. The real problem is co-determined outcomes – things that are driven by the same unobservable factors that also drive y. These will give us the wrong answer on average, and are terrible control variables. (You also shouldn’t control for things that are the direct result of y, for the same reason). Other outcomes are bad too – they blow up our standard errors and confidence intervals, because they are highly collinear with x and add no new information that is not already in x. The safe move is just to avoid controlling for outcomes entirely.

*This is still true even today, despite the credibility revolution that has swept through economics and also reached the other social sciences in recent years.
**The more your parents support you in college, the less you have to work.
***I picked up the term “proof by Stata” in John DiNardo‘s advanced program evaluation course at the University of Michigan.

You still have to test the results of your evolutionary learning strategy

Ricardo Hausmann argues against RCTs as a way to test development interventions, and advocates for the following approach instead of an RCT that distributes tablets to schools:

Consider the following thought experiment: We include some mechanism in the tablet to inform the teacher in real time about how well his or her pupils are absorbing the material being taught. We free all teachers to experiment with different software, different strategies, and different ways of using the new tool. The rapid feedback loop will make teachers adjust their strategies to maximize performance.

Over time, we will observe some teachers who have stumbled onto highly effective strategies. We then share what they have done with other teachers.

This approach has a big advantage over a randomized trial, because the design can adapt to circumstances that are specific to a given classroom. And, Hausmann asserts, it will yield approaches whose effectiveness is unconfounded by reverse causality or selection bias.

Clearly, teachers will be confusing correlation with causation when adjusting their strategies; but these errors will be revealed soon enough as their wrong assumptions do not yield better results.

Is this true? If so, we don’t need to run any more slow, unwieldy randomized trials. We can just ask participants themselves what works! Unfortunately, the idea that participants will automatically understand how well a program works is false. Using data from the Job Training Participation Act (JTPA), Smith, Whalley and Wilcox show that JTPA participants’ perceived benefits from the program were unrelated to the actual effects measured in an RCT. Instead, they seem to reflect simple before-after comparisons of outcome variables, which is a common mistake in program evaluation. The problems with before-after comparisons are particularly bad in a school setting, because all student performance indicators naturally trend upward with age and grade level.

An adaptive, evolutionary learning process can generate a high-quality program – but it cannot substitute for rigorously evaluating that program. Responding to Hausmann, Chris Blattman says “In fact, most organizations I know have spent the majority of budgets on programs with no evidence whatsoever.” This is true even of organizations that do high-quality learning using the rapid feedback loops described by Hausmann: many of the ideas generated by those processes don’t have big enough effects to be worth their costs.

That said, organizations doing truly effective interventions do make use of these kinds of rapid feedback loops and nimble, adaptive learning processes. They begin by looking at what we already know from previous research, come up with ideas, get feedback from participants, and do some internal M&E to see how things are going, repeating that process to develop a great program. Then they start doing small-scale evaluations – picking a non-randomized comparison group to rule out simple time trends, for example. If the early results look bad, they go back to the drawing board and change things. If they look good, they move to a bigger sample and a more-rigorous identification strategy.

An example of how this works can be found in Mango Tree Educational Enterprises Uganda, which developed a literacy program over several years of careful testing and piloting. Before moving to an initial RCT, they collected internal data on both their pilot schools and untreated schools nearby, which showed highly encouraging results. The results from the first-stage RCT, available in this paper I wrote with Rebecca Thornton, are very impressive.

My impression is that most organizations wait far too long to even start collecting data on their programs, and when the results don’t look good they are too committed to their approach to really be adaptive. The fundamental issue here is that causal inference is hard. That’s why it took human beings hundreds of thousands of years to discover the existence of germs. Social programs face the same problems of noisy data, omitted variables, strong time trends, and selection bias – and arguably fare even worse on those dimensions. As a result, no matter how convincing a program is, and how excellent its development process, we still need randomized experiments to know how effective it is.

 

Liquidity constraints can be deadly

My friend and coauthor Lasse Brune sent me FSD Kenya‘s report on the Kenya Financial Diaries project.  Much of the report, authored by Julie Zollman, confirms that what we already knew to be true for South Africa’s poor (based on the excellent book Portfolios of the Poor) also holds for low-income people in Kenya. I was struck, however, Zollman’s description of the stark consequences Kenyans face if they are unable to access financial markets:

We observed important delays – in some cases resulting in death – from respondents’ inability to finance emergency health spending. In some cases, the financial barriers were substantial – above KSh 20,000 – but in many cases, important delays in treating things like malaria and other infections stemmed from only very small financial barriers, below KSh 500.

500 shillings is just under five US dollars at market exchange rates. Even in the context of the financial lives of the study’s respondents, this seems to be an entirely manageable sum. For the average respondent 500 shillings is less than a third of monthly per-person income. FSD’s poorest respondents, who earn 221 shillings a month, should still be able to afford this cost over the course of a year; it represents 19% of their total budget. That is an amount they should be able to borrow and pay back, or possibly even to save up. Certainly this would be achievable in the developed world.

However, Kenya’s poor are desperately liquidity-constrained:

[E]ven if household income and consumption were steady from month to month, families could still be vulnerable if they could not increase spending to accommodate new and lumpy needs. A quarter of all households experienced actual hunger during the study and 9% of them experienced it during at least three interview intervals. Thirty-eight per cent went without a doctor or medicine when needed and 11% of households experienced this across three or more interview intervals.

Some of my own work (cowritten with Lasse) contributes to an extensive literature that documents the economic costs of liquidity constraints in Africa. These are important, but we should bear in mind that an inability to access credit or savings can also literally be the difference between life and death.

Is it ethical to give money to the poor? Almost certainly

GiveDirectly, the awesome charity that gives cash directly to poor people in developing countries, is back in the news. This time, they are getting attention for a study that uses the random variation in individual income provided by their original experiment to study whether increases in in the wealth of others harm people’s subjective well-being. This is an assumption that underlies a surprisingly popular theory that posits that increases in inequality per se have a direct impact on people’s health.* The most popular formulation of that theory says that inequality raises stress levels, which in turn cause poor health. This theory relies on inequality having a direct effect on stress and other measures of subjective well-being. Given how popular the stress-based theory is, there is surprisingly little evidence of this core claim.

Enter Haushofer, Reisinger, and Shapiro. Using the fact that the GiveDirectly RCT didn’t target all households in a given village, and randomized the intensity of the treatment, they can estimate the subjective well-being impacts not only of one’s own wealth but also of one’s neighbors’ wealth. Their headline result is that the effect of a $100 increase in village mean wealth is a 0.1 SD decline in life satisfaction – which is four times as big as the effect of own wealth. However, that effect declines substantially over time due to hedonic adaptation. This is an important contribution in a literature that lacked credible evidence on this topic.

However, not everyone is happy that they even looked at this question. Anke Hoeffler raises a concern about the very existence of this study.

I was just overwhelmed by a sense that this type of research should not be done at all. No matter how clever the identification strategy is. Am I the only one to think that is not ethical dishing out large sums of money in small communities and observing how jealous and unhappy this makes the unlucky members of these tight knit communities? In any case, what do we learn from a study like this? The results indicate that we should not single out households and give them large sums of money. So I hope this puts an end to unethical RCTs like this one.

I think this study was fundamentally an ethical one to do, for two distinct reasons: additionality and uncertainty.

By “additionality”, I am referring to the fact that this study uses existing data from an experiment that was run for a totally different reason: to assess the effects of large cash transfers to the poor on their spending, saving, and income. No one was directly exposed to any harm for the purpose of studying this question. Indeed, these results need to be assessed in light of the massive objective benefits of these transfers. One reading of this paper is that we should be more skeptical of subjective well-being measures, since they seem to be understating the welfare benefits of this intervention.

Second, uncertainty. We had limited ex ante knowledge of what this intervention would even do. Assuming that the impacts on subjective well-being and stress could only be negative takes, I think, an excessively pessimistic view of human nature. Surely, some people are happy for their friends. The true effect is probably an average of some positives and some negatives.

Indeed, the study has results that are surprising if one assumes the stress model must hold. What we learned from this study was much more subtle and detailed than a simple summary of the results can do justice to. First off, the authors emphasize that the effects on psychological well-being, positive or negative, fade away fairly quickly (see graph below). Second, the authors can also look at changes in village-level inequality, holding own wealth and village-mean wealth constant. There the effects are zero – directly contradicting the usual statement of the stress theory. If one’s neighbors are richer, it does not matter if the money is spread evenly among them or concentrated at the top.

Figure 2 from Haushofer, Reisinger, and Shapiro (2015)
Figure 2 from Haushofer, Reisinger, and Shapiro (2015)

Third, the treatment effects are more mixed, and probably smaller, than the elasticity quoted above. There’s no ex ante reason to think life satisfaction is the most relevant outcome, and none of the other measures show statistically-significant effects. Indeed, if we take these fairly small and somewhat noisily-estimated point estimates as our best guesses at the true parameter values, we should also take seriously the effects on cortisol, a biomarker of actual changes in stress. Here, the impact is negative: an increase in village-mean wealth makes people (very slightly) less stressed. This result is even stronger when we focus just on untreated households, instead of including those who got cash transfers. One way of accounting for the variation in these estimates that improves the signal-to-noise ratio is to look at an index that combines several different outcomes. The effects on that index are still negative, but smaller compared with the direct effects of the transfer. If we look just at effects on people who did not get a transfer, that index declines by 0.01 SDs, which is only a third of the magnitude of the effect of one’s own wealth change (a 0.03 SD increase).

My conclusion from the evidence in the paper is twofold. First, there’s nothing here to suggest that the indirect effects of others’ wealth on subjective well-being are large enough to offset the  direct benefits of giving cash to the poor. Second, the evidence for direct effects of others’ wealth on subjective well-being is fairly weak. This paper is far and away the best evidence on the topic that I am aware of, but we can barely reject that the effect is zero, if at all. That casts doubt on the stress-and-inequality theory, and means that more research is certainly needed on the topic. More experiments that give money to the poor would be a great way to do that.

*As Angus Deaton explains in this fantastic JEL article, inequality has an obvious link to average health outcomes through its effect on the incomes of the poor. The stress-based theory posits that an increase in inequality would make poor people sicker even if their income remained constant.