Odds ratios are a catastrophe

Adam Larson sent me the following question about a study of obesity and a press release about it from NPR. The claim, made in both the press release and the underlying article, is that weight discrimination makes the already obese 3 times as likely to remain obese, and the non-obese 2.5 times as likely to become obese. Adam writes:

They interpret odds ratios of 2.5 and 3 as “2.5 times as likely” and “3 times as likely”.

Balderdash, yes? I assume what they’re getting at is that in one group something like 85% remained obese; in the other 75%. This gives an odds ratio of (.85/.15)/(.75/.25)=1.89

So common sense would call it a 10 percentage point decrease or a 12% decrease, right?

Adam is spot-on. An odds ratio is the odds of an event happening for one group divided by the odds of a thing happening in another. Odds are summaries of probabilities that get used by sports books and nearly no one else, because they are counter-intuitive non-linear approximations to probabilities. If an event has an X% chance of happening, the odds that it happens are (X%)/(100-X%). The basic problem with odds ratios is that long ago someone (we should figure out who and curse their name) realized that for rare outcomes, an OR is approximately a relative risk, or (% chance thing occurs in treatment group)/(% chance thing occurs in control group). That is:

(0.01/0.99)/(0.02/0.98) ≈ 0.5 = 0.01/0.02

That has ever since been taught to applied statisticians working in certain fields (public health is one example) who use odds ratios for the scientifically important reason that they are the default output of many regression packages when you run a logistic regression.*

And so people misinterpret them constantly, presently odds ratios as relative risks even when they are not small, and the approximation does not hold. This is even before we get into the fact that calling a change from P=0.01 to P=0.02 a “100% increase in risk” is itself fairly absurd and misleading. It’s a one percentage-point increase. There is no intrinsic sense in which “the risk tripled” actually means anything. Did you know that if you go in the ocean you are infinity times more likely to get eaten by a shark than if you stay on land? (You probably did, but it’s a stupid number to think about. What is actually relevant is that the absolute risk went up by some fraction of a percentage point.)

For this paper, under the assumption that their regression adjustment doesn’t change too much, we can actually back out what the percentages really are. First, the effect on the not-initially-obese:

Mean outcome = (% discriminated)*(mean for discriminated people) + (1 – % discriminated)*(mean for non-discriminated people)
0.058 = 0.08X + 0.92Y
Odds ratio = ((mean for discriminated people)/(1 – mean for discriminated people)) / ((mean for non-discriminated people)/(1 – mean for non-discriminated people))
(X/(1-X))/(Y/(1-Y)) = 2.54
Y = (50X)/(127-77X)
0.058 = 0.08*X + 0.92*(50X)/(127-77X)
X = 0.1230
Y= 0.0523

So the change is 7.2 percentage points. Put less clearly, P(became obese) has gone up by a factor of 2.35 for those who experienced weight discrimination, relative to those who did not. That is different from the OR of 2.54, but their figure isn’t too far from the relative risk.

Repeating the process for their other analysis, however, reveals how misleading ORs can be:
0.263= 0.08X + 0.92Y
(X/(1-X))/(Y/(1-Y)) = 3.20
Solving these equations for X and Y gives us:
X = 0.505
Y= 0.242

Here the risk ratio is 2.08, not 3.20. The percentage-point change of 26.3 remains completely comprehensible, as it always is. Misusing odds ratios here allowed them to overstate the size of their effect a factor of 50%! I suspect, but am not sure how to prove, that with regression adjustment these figures could look even more misleading.

As most people who read this already know, even if presented correctly the figures wouldn’t mean anything. There’s no reason to believe the relationship being studied is causal in nature. Indeed, it probably suffers from classic reverse causality: people who are gaining weight (or failing to lose weight) are likely to perceive a greater degree of weight discrimination. But presentation matters too. First, clear presentation can help us make use of studies, even when they are as limited as this one is. As the above derivation illustrates, figuring out what an odds ratio actually means involves 1) the annoying process of scrounging through the paper for all the variables you need and 2) solving a system of two equations for two unknowns, which most people can’t do in their head. This detracts very substantially from a paper’s clarity: in general, when I see odds ratios presented in a paper, I have no idea what they mean. An OR of 2 could mean that the risk went from 1% to 2% or (to use a variation on Adam’s example) from 75% to 86%, or a whole host of other things.

Second, poor presentation has consequences. Health risks are often reported using relative risks, or, worse yet, using ORs that are presented as relative risks. This is often extremely misleading, since a doubling of risk could mean that the chance went from 0.001% to 0.002%, or from 50% to 100%. Misleading and confusing people about risks undermines the basic goal of presenting health risks in the first place: to help people make better decisions about their health.

*I honestly believe that if we made mean marginal effects the default, and forced people to do ORs and AORs manually, they would disappear within 10 years. Being forced to construct ORs manually would also force people to understand what they are, which would stop people from using them.

8 thoughts on “Odds ratios are a catastrophe”

  1. One advantage of odds ratios is that they are invariant to labeling. Let’s consider 75% versus 86% probabilities of being accepted, a risk ratio of 1.14, or a 14% increase. We could equivalently write these numbers as 25% and 14% probabilities of being rejected. The risk ratio is now 0.56, a 44% decrease, seemingly quite different! In contrast, the odds ratio is approximately 2 for acceptance and 1/2 for rejection, perfectly symmetric.

    I don’t see that you’ve stated any particular benefit of risk ratios, only asserted that odds ratios and risk ratios are different. To the extent you have a valid complaint, it can be addressed by stating a baseline percentage (e.g., stating that the odds double from a baseline of 14%).

    Finally, you argue that odds ratios can be misleading when probabilities are small, such as an an odds ratio of 2 from a baseline of 0.1%. But anything can be misleading without the proper context. For example an increase of 0.1 percentage points sounds small without any context, but there are many cases where this could be big and important, such as a 0.1 percentage point increase in the weekly probability of death. I’d say it is much better to describe this as doubling the odds of dying.

    1. I do not advocate the use of risk ratios either. Rather, I oppose the presentation of odds ratios as if they are risk ratios. In my experience this is done almost universally in press releases, and the vast preponderance of the time in articles themselves. My strong preference is for the presentation of mean marginal effects – the actual change in the risk in absolute terms – along with the control-group mean risk.

      If you say the risk doubles from a baseline of 14%, I know the new risk is 28% immediately. If you say the odds double from a baseline of 14%, then to compute the new risk I need to compute the baseline odds (0.14/(1-0.14) = 0.1628), compute the new odds (2*0.1628 = 0.3256), and solve for the new risk (x/(1-x) = 0.3256 ==> x = 0.2456, so about 25%). This has the problem of not being the same number as the above, in addition to being a cumbersome calculation. In contrast, anyone who actually *wants* the odds ratio can quickly compute it from the baseline risk and the risk ratio.

      It seems to me that the relevant context in your example is captured by a) what risk we are talking about and b) the base rate, rather than c) expressing it as an odds ratio. Strictly speaking, your example is a doubling of the *risk* of dying. The odds of dying go up by slightly more than a factor of 2.

  2. I see that you’re an economist, but you’re giving an example from medicine (about obesity). Medical studies are rarely based on a random sample, so the calculation you advocate is typically not possible or is not especially relevant. In case-control studies (e.g., studying the causes of a disease by sampling 100 people with the disease and 100 people without), a control-group mean cannot be calculated, at least not without outside information. In randomized clinical trials that recruit from a particular population, such as patients in one hospital who meet certain criteria, a control group mean is of some interest, but isn’t especially relevant for studying a treatment that will be applied in a variety of populations, each with a different mean.

    More generally, I agree that marginal effects and control group means are more appropriate for a non-technical audience than odds ratios, since most people don’t know what odds are. I also agree that many studies would be improved by reporting some kind of baseline mean along with the odds ratios, marginal effects, or other effect. However, for a technical audience, I’m less convinced that it makes sense to convert odds ratios, which are the direct output of a logit model, to marginal probabilities, which are farther removed from the model.

  3. You can definitely define a control-group mean for a non-random sample, or in a non-experimental design. In any case, you could replace “control group mean” with “population mean” or “sample mean” and it would serve the same purpose, which is to put the mean marginal effect in context.

    1. I wrote, “In case-control studies … a control-group mean cannot be calculated, at least not without outside information.” And you can replace “control group mean” with comparison group mean, sample mean, etc. Or at least none of these statistics would be at all meaningful. That’s because when sampling is conditioned on the dependent variable, the sample mean is whatever the researcher chose when designing the study. Surely you’re not denying that?

      1. I’m not sure I follow. Those are definitely means that you can compute. Or, if you’d like, you could use the population mean.

        I don’t see how an odds ratio avoids any potential problems with a control-group mean – it’s just [(mean in treatment group)/(1-mean in treatment group)]/[(mean in control group)/(1-mean in control group)].

  4. A case-control study works by sampling conditional on the outcome. For example, if you study the causes of a disease by sampling equal numbers of people with and without the disease, the sample percentage is 50%. But obviously this has nothing at all to do with the population percentage. Similarly, the treatment and comparison group means with this kind of sampling are meaningless. Nonetheless, case-control studies can be used to estimate unbiased treatment/comparison odds ratios.

    The fact that odds ratios and logit coefficients are invariant to this kind of sampling, which is extremely common in medical research, is a major reason why odds ratios are used so commonly.

  5. Nice post. I, too, strongly dislike odds ratios. A year or so ago, I read a paper with lots of odds ratios, and realizing I didn’t really have strong (or even weak) intuitions about what they meant, I made a Jupyter notebook (html version) to build up my intuitions and visualize the weird non-linear relationships between odds ratios and probabilities. I managed to convince myself that odds ratios are all but useless, if not actively misleading.

Leave a Reply to Jason Kerwin Cancel reply

Your email address will not be published. Required fields are marked *