Peer effects – the impact of your friends, colleagues, and neighbors on your own behavior – are important in many areas of social science. To analyze peer effects we usually estimate equations of the form

We want to know the value of β – how much does an increase in your peers’ performance raise your own performance?

Peer effects are notoriously difficult to measure: if you see that smokers tend to be friends with other smokers, is that because smoking rubs off on your friend? Or because smokers tend to come from similar demographic groups? Even if you can show that the these selections problems don’t affect your data, you face an issue that Charles Manski called the “reflection problem”: if having a high-achieving friend raises my test scores, then my higher test scores should in turn raise his scores, and so on, so the magnitude of the peer effects is hard to pin down.

A standard way of addressing these problems is to randomly assign people to peers, and to use a measure of performance that is measured ex ante or otherwise unaffected by reflection. That fixes the problems, so we get consistent estimates of beta, right?

Wrong. We still have a subtle problem whose importance wasn’t formally raised until 2009 in a paper by Guryan, Kroft, and Notowidigdo: you can’t be your own friend, or your own peer, or your own neighbor. Suppose our setting is assigning students a study partner , and the outcome we are interested in is test scores. We want to know the impact of having a higher-ability peer (as measured by the most recent previous test score) on future test scores. The fact that you can’t be your own peer creates a mechanical negative correlation between each student’s ability and that of their assigned peer. To see why, imagine assigning the peer for the highest-ability student in the class. Any partner she is assigned to – even if we choose entirely at random from the other students – will have a lower score on the most-recent test than her. And for any student who is above-average, their assigned peer will, on average, be lower-ability than them. The reverse applies to students who are below the class average.

This is a big problem for estimating beta in the equation above. The error term e_{i} can be broken up into a part that is driven by student ability, OwnAbility_{i}, and a remaining component, v_{i}. Since OwnAbility_{i} is negatively correlated with PeerAbility_{i}, so is the overall error term. Hence, even in our random experiment, we have a classic case of omitted-variable bias. The estimated effect of your peers’ ability on your own performance is biased downward – it is an underestimate, and often a very large one.

What this means is that randomized experiments are not enough. If you randomly assign people to peers and estimate β using the equation above, you will get the wrong answer. Fortunately, there are solutions. In a new paper, Bet Caeyers and Marcel Fafchamps describe this “can’t be your own friend” problem in detail, calling it “exclusion bias”. They show that several common econometric approaches actually make the problem worse. For example, controlling for cluster fixed effects often exacerbates the bias because the clusters are often correlated with the groups used to draw the peers. They also show that 2SLS estimates of peer effects do not suffer from exclusion bias – which helps explain why 2SLS estimates of peer effects are often larger than OLS estimates.

They also show how to get unbiased estimates of peer effects for different kinds of network structure. Unfortunately there is no simple answer – the approach that works depends closely on the kind of data that you have. But the paper is a fantastic resource for anyone who wants to get consistent estimates of the effect of people’s peers on their own performance.