Can the UNODC's Murder Statistics be Trusted?

My parents came to visit me in Malawi back in December, and this did wonders for my mom’s level of concern about my welfare. She was able to see that Malawi at least looks relatively safe. We got to discussing safety and violence after the horrific murders of 20 kindergarteners that month. I made the off-hand claim that I am physically safer here than in the US. I’ve heard about awful crimes in both places, but I’m convinced in particular that my chances of being murdered are much lower here.

A couple of weeks ago I got around to looking that up to see whether the data confirmed my guess. I quickly found this Wikipedia page listing the intentional homicide rate for every country, which reports murder statistics from the UN Office on Drugs and Crime (UNODC). The UNODC figures assert that Malawi has an intentional homicide rate of 36.0 per 100,000 people, which is the twelfth-highest murder rate in the world. That’s a truly horrific figure, if true. It’s more than any US city save Detroit and New Orleans, but just 20% of Malawians live in urban areas.

I cannot possibly square that high of a murder rate with my experience here. I collected the data for my survey in Traditional Authority Mwambo, a rural area that conveniently has about 100,000 people in it. I was there for about 4 months, and during that time I befriended all of the local authorities, especially the police. In managing my research team, I was very cognizant of crime and our personal security, and pursued any and all rumors with my friends at the Jali and Kachulu police stations and at the local road traffic police as well. For their part, they were very open about the cases they were dealing with, and at one point the Jali police actually helped us find a different, more-secure place to stay out there. If Mwambo matched the national average, you’d expect 12 murders there over the course of four months. Even if the cities in Malawi had murder rates of 150 per 100,000, nearly triple the rate of the US city with the most murders per person, we would expect to see 7.5 murders a year and at least 1 or 2 over the course of 4 months. I heard about zero. I discussed a wide range of crimes, including some shootings, with local authorities there, but no homicides whatsoever.

Why am I writing about statistics from Wikipedia in the middle of the night? Because the Internet is serious business.

Data nerds such as myself like to talk about using the “smell test” on their results, and frankly this number just stinks every way I sniff it. Another way it smells is that nationwide, 36 murders per 100,000 people is about 100 murders per week. There are definitely murders reported in the Malawian press, but I would venture that I see about 1 or 2 per week, not 100. Alternatively, we can look at the distribution of all causes of death. Malawi has a death rate of 1350 per 100,000 people, so according to the UNODC murders cause 2.7% of all deaths in the country. That would mean that murder would rank above tuberculosis and ischemic heart disease in this ranking of the top ten causes of death in Malawi. Incidentally, it would also mean murder should itself be on that list, knocking off malnutrition.

The Wikipedia article has numerous caveats and hedges, including the suggestion that the data may include attempted murders as well as successful ones. However, it also has a link to the underlying table from the UN Office on Drugs and Crime. Annoyed by my inability to square the reported murder rate with other facts about Malawi, I decided to see where they were getting it from. In the footnotes, they attribute it to the World Health Organization Global Burden of Disease Mortality Estimates. After digging through the WHO website, I came to this page where one can download the datasets used for the Global Burden of Disease calculations. These are files that contain observations by year, country, gender, and disease, where disease is represented by an ICD code (there are different files for the ICD-7, ICD-8, ICD-9, and ICD-10 codes). If you know the ICD code you want you can look up total deaths as well as deaths by age bracket.

I didn’t get that far, though: none of the files have any entries for Malawi, and the data availability index doesn’t list Malawi data for any year. There is a country code for Malawi (1270) but it doesn’t actually appear to get used. I can’t say for certain where the claim of 36 murders per 100,000 people comes from, but I can tell you it’s definitely not from the WHO Mortality Database.

Now, any number of things could have gone wrong here. Maybe I took a wrong turn as I hunted for the WHO data the UNODC rely on, or overlooked something else obvious. It’s also possible that entries got miscoded, either in the UNODC or the WHO files, leading me astray. Or maybe there was private communication between those two UN offices, and the underlying data actually isn’t public.

Fortunately, there are tricks I can use even when I can’t get my hands on the actual data. Back in 1938, Frank Benford observed that many datasets have the property that the leading digits of numbers (the “7” in “743”, for example) are logarithmically distributed, and death rates were actually one of the examples he leaned on in demonstrating what we now call “Benford’s Law”. If the law holds exactly, we’d expect 30.1% of leading digits to be “1”s, 17.6% to be “2”s, and so on, with a known, predictable percentage for each digit. And we can run a statistical test to see if deviations from the expected pattern are large enough to be meaningful, or are just random fluctuations. Using the firstdigit package in Stata, I ran this test on the UNODC spreadsheet’s mortality rates from 2008, which is the most-populated year in the table. As you can see, there are more leading “1”s than we’d expect under Benford’s Law, and across all digits the deviation from Benford is statistically significant at the 5% level – the p-value is 0.011, so we’re just barely above the cutoff to get 3 stars in a journal article.

firstdigit

It’s possible to delve deeper: what I’m really curious about is not all the statistics – it would be hard to get the ones for big countries like the US wrong – but specifically the figures attributed to the WHO Global Burden of Disease. If I break the data down into observations that list “WHO” as the source and everything else, only the WHO data looks suspicious (p=0.040), while everything else conforms reasonably well to Benford’s Law (p=0.214).* Or I can use the slightly-broader “PH” category for all public health-derived rates. Those look iffy (p=0.025) whereas the non-PH murder rates look alright (p=0.154). What’s more these aren’t just cases of large samples helping me to find spurious “statistically significant” effects: there are just 61 values coded PH in the data, and 187 overall.

The takeaway from that is that not just the Malawi murder but all the UNODC data supposedly derived from public health sources is questionable. I’m not trying to claim that these statistics were necessarily faked intentionally. I can imagine a number of ways they could have been screwed up by mistake. There might even be some reason why Benford’s Law would hold for some of these murder rates and not for others. Even if there was intent I have no idea who might have been responsible. What I am trying to claim is that they shouldn’t be taken seriously, or relied on for anything of importance, until someone can verify their source. And I do think this matters. People rely on these numbers, and draw judgments based on them. A glance at the top-ranking countries on Wikipedia’s list, would, for example, neatly confirm someone’s preconceived notions about Africa being a violent place. The top three African countries on that list are Zambia, Uganda, and Malawi – all have their statistics attributed to the WHO, and none actually appear in the WHO mortality data.

EDIT: I changed the Wikipedia article to remove the entries that I tried to trace down but could not find, until the source of the UNODC numbers is located or they are replaced with something better (Nameless has a suggestion in this post’s comments).

* I looked at all this a while ago but was just sitting on it until a recent Andrew Gelman post that cites the UNODC statistics prompted me to do something with it. I know Gelman wouldn’t like the fact that I’m leaning on p-values for the Benford’s law analysis, but I just don’t have any intuitive grasp of chi-square values.

Will cash transfers be better than subsidies for India's poor?

Adam Schwartz passes along this article stating that India is going ahead with plans to convert its myriad subsidies for the poor into a single cash distribution scheme tied to its biometric identity card system.

My knee-jerk response is that this is great: subsidies are distortionary, and fairly paternalistic. The premise is that elites or policymakers can better decide what the poor need than the poor themselves, which is a bit grating if you really think about it. While some might be more tolerant of India’s subsidies because they are designed by other Indians rather than by white people/foreigners/etc., I am definitely not.

The other perspective is that when people are given cash instead of an in-kind handout or a subsidy, they will spend it poorly – wasting it on stuff that’s useless or bad for them, like alcohol or tobacco. I’m pretty sympathetic to this view, too, and I don’t see it as contradictory to my dislike of paternalism. People may have reasonable goals that they can’t stick to, and subsidies can be a useful tool for committing their spending. There’s increasing interest in the possibility that these kinds of commitment problems are important in driving persistent poverty. One of my favorite papers, by Banerjee and Mullainathan, develops a very intuitive model that could explain this behavior. A counter-intuitive prediction of that model is that if the poor face the temptation to misspend, large lump sums of money may be preferable to getting the same amount of cash in small installemtns: rather than “burning a hole in your pocket, the large sum lets you buy something worthwhile instead of frittering away your cash on trivialties. On the empirical side, Kathleen Beegle, Emanuela Galasso, Jessica Goldberg, Charles Mandala and Tavneet Suri are working on a project that will randomly vary whether workers receive payments in lump sums or small installments. I’m also in the planning phases of a project with Lasse Brune that will look at this issue.

But understanding the fundamental determinants of consumption behavior is a pretty hefty task; even an optimist like myself must admit that it will take economists a long while to sort it out. Looking directly at cash transfer programs, however, we already know quite a bit. Evidence from various African contexts shows that unconditional cash handouts can have big benefits. One potential drawback is that they may reduce labor supply; since people tend to work for money, that’s easy to predict from even very simple economic models.

Tempering the large measured benefits of giving money to the poor, however, is the fact that many of those findings are specific to programs that target women. And lest we assume that, for example, the Zomba Cash Transfers project would have had the same benefits had it targeted men alone, work by my advisor and Hans-Peter Kohler suggests that while a cash windfall accruing to women leads to decreases in sexual risk-taking, the same windfall for men leads to rises in the amount of risky sex people have. The obvious inference is that this has to do with transactional sex: the men are able to buy it, the women able to avoid selling it.

It’s hard to say how this would play out in India. In some ways, gender inequities are far worse there than in Africa – while Africa does have “missing women”, for example, there is no evidence of the female infanticide nor of the deep favoritism toward young male children that is famously prevalent in much of India. On the other hand, my impression from visiting India is that transactional sex is not nearly as common there, which may be related to a culture that is prudish enough that some people still riot over public kissing. But the former issue is still a major concern – if men control this money moreso than the subsidies it replaces, and if women show less son preference than men, this program could do harm to girls that are already some of the most disadvantaged children on earth.

On balance, I think the shift to cash will do good on net. But this program is just crying out for a randomized phase-in process. There are legitimate questions about its impact, and the system lacks the capacity to roll it out to everyone at once. This is the textbook example of a case where a government should randomize the phase-in, say by locality, to see what the effects are. Unfortunately I don’t see any evidence that they’re planning to do that. And I’m sure this isn’t for lack of expertise – virtually any microeconomist would love to be able to access data on an experiment like that. I’ll even offer up myself: if Mr. Chidambaram happens to read this, I’ll gladly drop what I’m doing for a couple of days to set up a randomly-ordered list of the remaining districts where the scheme is to be rolled out. And I’ll do it for free!