Don’t pick controls based on stars in your balance table

I just saw someone making this mistake yet again, and realized that this is a bit of applied econometric wisdom that is not widely known. In papers based on RCTs it is standard for Table 1 to be a balance table, showing the means of baseline variables by study arm and testing for the equality of those means. (People often also show joint balance tests across all variable—I have a recent working paper about how to run joint tests of equality correctly.)

A very common and incorrect way that people use their balance table is to pick controls for their regression analysis of the effect of the treatment. Specifically, the process that I often see is to look for variables with t-statistics above 1.96 (or 1.65) and use those as controls. That is, people control for anything with stars in their balance table.

This approach is wrong.

In their classic paper on running and analyzing randomized trials, Bruhn and McKenzie cite earlier work by Permutt showing the problems that this approach can cause. Specifically, the significance level of the test is lower than the nominal level, meaning that your CIs will be too wide and you will under-reject the null:

image

In addition to giving too-wide CIs, this approach can also lead to incorrect point estimates. Appendix E of my job-market paper (now forthcoming at the Economic Journal) shows that failing to control for baseline values of the outcome variable induces finite-sample bias in the estimates, even if the baseline test for the equality of means is insignificant:

Screenshot 2025-02-21 121329

(The “optimal” estimator here is just an ANCOVA specification where I control for Y measured at baseline).

What should you do instead? The best practice is to control for:

1) stratification cell indicators

2) anything else that was used in the randomization procedure

3) baseline values of the outcome variable

4) additional variables that are selected via the double lasso (-pdslasso- in Stata), although typically this procedure will not select very many variables

Leave a Reply

Your email address will not be published. Required fields are marked *