The MatchIt package manual says that:
The standardized mean differences are computed both before and after matching or subclassification as the difference in treatment group means divided by a standardization factor computed in the unmatched (original) sample. The standardization factor depends on the argument supplied toestimand in matchit(): for "ATT", it is the standard deviation in the treated group; for "ATC",it is the standard deviation in the control group; for "ATE", it is the square root of the average of the variances within each treatment group. The post-matching mean difference is computed with weighted means in the treatment groups using the matching or subclassification weights.
Why the denominator is just computed from unmatched sample? It is generally assumed that the standard deviation should be derived from the sample used to calculate mean and mean difference. That is to say, if the mean difference is the difference between treated and control group from matched samples, the denominator should be the pooled standard deviation of treated and control group from matched samples (square root of the average of the variances within each group from matched samples).
CodePudding user response:
This is explained in Stuart (2008) and in the cobalt
vignette. The problem is that when comparing balance before and after matching, the SMD will be affected not only by changes in balance but also by changes in the standard deviation of the covariate when the standard deviation of the matched sample is used as the standardization factor after matching. This muddles two things together when we only care about one. Holding the standard deviation constant prevents this, isolating the effect of matching on balance alone.
Consider the following example. Let's say the mean of a covariate X
(e.g., age) in the treated group is 44 and the mean in the control group is 46, and the pooled standard deviation is 9. Let's say that after matching, the control group mean is now 45 and the pooled standard deviation is now 4. Was there better balance before matching or after matching?
It should be clear that the covariate means are closer together, which indicates an improvement in balance and therefore a reduction in bias. Which method of computing the SMD reflects this?
Prior to matching, the SMD is (46-44)/9 = .22
. By the standard criterion of SMDs less than .1, this would be considered imbalanced.
Using the formula for the SMD that uses the standard deviation in the unmatched sample, the matched SMD is (45-44)/9 = .11
, indicating better balance.
Using the formula for the SMD that uses the standard deviation in the matched sample, the matched SMD is (45-44)/4 = .25
, indicating that balance got worse after matching!
Remember that the bias of the effect estimate is a function of the mean differences, and standardizing them to produce the SMD is just a way to simplify balance statistics for users. It's all arbitrary anyway, but at least using the unmatched standard deviation correctly isolates changes in balance from changes in variability, the latter of which is not related to bias.