Home > Net >  How to create table comparing control group and treatment group after propensity score matching?
How to create table comparing control group and treatment group after propensity score matching?

Time:03-21

I ran a propensity matching to test the effect of a medical treatment. I want to compare the demographics of the control group with the treatment group. How do I create a basic table to compare them?

this is my code so far

    #defining variables
Tr <- cbind(VAECMO)
Y <- cbind (DIED)
X <- cbind(AGE, RACE, CKD, RSPTFL, ZIPINC_QRTL, CLUNGD)

#descriptive statistics
summary(Tr)
summary(Y)
summary(X)

#Propensity score model
glm1 <- glm(Tr ~ X, family = binomial, data = comor)
summary(glm1)

#making sure there are no missing values in the necessary observations
length(Tr)
length(glm1$fitted)

#Average Treatment on the treated effect
rr1 <- Match(Y = Y, Tr = Tr, X = glm1$fitted)
summary(rr1)

I want to create a table that compares the average age, composition of race, etc of those who received the Tr and those who didn't.

CodePudding user response:

What you want is to check the balancing you achieved with PS matching, i.e., whether the descriptives of the treated and the control have reasonably converged.

Matching::MatchBalance provides a rich set of comparative statistics, which helps to draw the respective conclusions.

Here an example using the lalonde data. Note, that you shouldn't apply one-to-one matching when using propensity scores, i.e. the number of matches should not be one, use e.g. M=5. Also a caliper= of e.g. 0.25 standard deviations of maximal acceptable distance for matches should be used to take some account of the common support in both groups.

library(Matching)
data(lalonde)

## Estimate propensity score
fit <- glm(treat ~ age   I(age^2)   educ   I(educ^2)   black  
             hisp   married   nodegr   re74    I(re74^2)   re75   I(re75^2)  
             u74   u75, family=binomial, data=lalonde)

lalonde$ps <- fit$fitted.values  ## store ps in data set

## matching
rr <- with(lalonde, Match(Y=re78, Tr=treat, X=ps, M=5, caliper=.25))

To compare the groups, define a set of comparison variables to easily subset later.

## define comparison variables
(comp <- c('ps', all.vars(fit$call)[1:11]))
# [1] "ps"      "treat"   "age"     "educ"    "black"   "hisp"    "married"
# [8] "nodegr"  "re74"    "re75"    "u74"     "u75"  

To just compare the groups without any matching, simply aggregate the mean or something else.

aggregate(. ~ treat, lalonde[comp], mean)
#   treat        ps      age     educ     black       hisp   married    nodegr     re74     re75       u74       u75
# 1     0 0.3936490 25.05385 10.08846 0.8269231 0.10769231 0.1538462 0.8346154 2107.027 1266.909 0.7500000 0.6846154
# 2     1 0.4467636 25.81622 10.34595 0.8432432 0.05945946 0.1891892 0.7081081 2095.574 1532.056 0.7081081 0.6000000

To compare the groups before and after matching use Matching::MatchBalance. You may easily use reformulate in the function's formula interface. For the before-after comparison, provide your result rr in match.out=. (To omit console output, use print.level=0.)

bal <- MatchBalance(formul=reformulate(comp[-2], 'treat'), data=lalonde, 
                    match.out=rr)
# ***** (V1) ps *****
#                        Before Matching     After Matching
# mean treatment........    0.44676             0.43271 
# mean control..........    0.39365              0.4327 
# std mean diff.........     45.329            0.010439 
# 
# mean raw eQQ diff.....   0.054029           0.0017398 
# med  raw eQQ diff.....   0.060548          0.00085473 
# max  raw eQQ diff.....    0.10393            0.024826 
# 
# mean eCDF diff........    0.13402           0.0043977 
# med  eCDF diff........    0.15743           0.0032538 
# max  eCDF diff........    0.22443            0.029284 
# 
# var ratio (Tr/Co).....     1.2101             0.99967 
# T-test p-value........ 1.4842e-06             0.97518 
# KS Bootstrap p-value.. < 2.22e-16               0.782 
# KS Naive p-value...... 3.7341e-05             0.82407 
# KS Statistic..........    0.22443            0.029284 
#
# [...]
#
# Before Matching Minimum p.value: < 2.22e-16 
# Variable Name(s): ps  Number(s): 1 
# 
# After Matching Minimum p.value: 0.02 
# Variable Name(s): re75  Number(s): 9 

Basically, for each variable the function compares the treatment groups before and after matching; mean and standardized mean differences are provided as well as p-values of t-tests and KS tests. At the very bottom, the output states the worst variables according to the t-test p-values.

However, rather than t-tests, in the literature commonly provided are the standardized absolute differences, which in the ideal case should of course be close to zero and may be extracted from the balancing like so:

(sad <- cbind(sapply(bal$BeforeMatching, '[[', 'sdiff.pooled'),
              sapply(bal$AfterMatching, '[[', 'sdiff.pooled')) |>
  abs() |>
  `dimnames<-`(list(comp[-2], c('before', 'after'))))
# ps      47.4345458 0.0104392
# age     10.7277121 7.1516980
# educ    14.1219821 5.5883637
# black    4.3886611 0.0177414
# hisp    17.4561071 0.4161040
# married  9.3640701 2.6733063
# nodegr  30.3986439 3.4347952
# re74     0.2159921 6.2271032
# re75     8.3863254 4.7320098
# u74      9.4140477 6.5848971
# u75     17.6809436 8.5502059

They may easily be plotted using matplot.

matplot(sad, pch=20, xaxt='n', main='Matching success', 
        ylab='Std. abs. diff.')
arrows(seq_along(comp[-2]), sad[, 1], seq_along(comp[-2]), sad[, 2],
       length=.1, col=8)
axis(1, seq_along(comp[-2]), labels=comp[-2])
legend('topright', pch=20, col=1:2, leg=c('before matching', 'after matching'))

enter image description here

Note: R >= 4.1 used.

CodePudding user response:

Use the cobalt package, which was specifically designed for this purpose. If you decide to use a package other than Matching to perform the matching, for example, MatchIt or optmatch, cobalt works with them, too. See the cobalt website for information.

Importantly, cobalt requires minimal coding from the user and is extremely flexible producing whichever balance statistics you want and plots to display balance for covariates individually or overall. See below for some examples. I'll use the same example as jay.sf.

library(Matching)
#> Loading required package: MASS
#> ## 
#> ##  Matching (Version 4.9-11, Build Date: 2021-10-18)
#> ##  See http://sekhon.berkeley.edu/matching for additional documentation.
#> ##  Please cite software as:
#> ##   Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
#> ##   Software with Automated Balance Optimization: The Matching package for R.''
#> ##   Journal of Statistical Software, 42(7): 1-52. 
#> ##
library(cobalt)
#>  cobalt (Version 4.3.2.9000, Build Date: 2022-02-28)
data(lalonde, package = "Matching")

## Estimate propensity score
fit <- glm(treat ~ age   I(age^2)   educ   I(educ^2)   black  
             hisp   married   nodegr   re74    I(re74^2)   re75   I(re75^2)  
             u74   u75, family=binomial, data=lalonde)

lalonde$ps <- fit$fitted.values  ## store ps in data set

## matching
rr <- with(lalonde, Match(Y=re78, Tr=treat, X=ps, M=5, caliper=.25))

Below we use bal.tab() to display balance statistics, including the treated group and control group means and the standardized mean differences before and after matching.

bal.tab(rr, formula = treat ~ age   educ   black   hisp   married   nodegr   re74   re75,
        data = lalonde, un = TRUE, disp.means = TRUE)
#> Balance Measures
#>            Type    M.0.Un    M.1.Un Diff.Un   M.0.Adj   M.1.Adj Diff.Adj
#> age     Contin.   25.0538   25.8162  0.1066   25.0932   25.6023   0.0712
#> educ    Contin.   10.0885   10.3459  0.1281   10.4271   10.3295  -0.0485
#> black    Binary    0.8269    0.8432  0.0163    0.8523    0.8523  -0.0001
#> hisp     Binary    0.1077    0.0595 -0.0482    0.0615    0.0625   0.0010
#> married  Binary    0.1538    0.1892  0.0353    0.1770    0.1875   0.0105
#> nodegr   Binary    0.8346    0.7081 -0.1265    0.7119    0.7273   0.0153
#> re74    Contin. 2107.0268 2095.5740 -0.0023 1755.7096 2062.6252   0.0628
#> re75    Contin. 1266.9092 1532.0556  0.0824 1377.3531 1530.7851   0.0477
#> 
#> Sample sizes
#>                      Control Treated
#> All                   260.       185
#> Matched (ESS)         160.99     176
#> Matched (Unweighted)  242.       176
#> Unmatched              18.         0
#> Discarded               0.         9

Below we use bal.plot() to examine distributional balance for a single covariate.

bal.plot(rr, formula = treat ~ age   educ   black   hisp   married   nodegr   re74   re75,
         data = lalonde, var.name = "educ", which = "both")

Below we create a Love plot using love.plot() to summarize balance in the sample. The plot is highly customizable; here I add a threshold line for standardized mean differences of .05.

love.plot(rr, formula = treat ~ age   educ   black   hisp   married   nodegr   re74   re75,
          data = lalonde, abs = TRUE, var.order = "unadjusted", thresholds = c(m = .05))

Created on 2022-03-20 by the reprex package (v2.0.1)

There are many examples on the cobalt website and many ways to customize the tables and plots. It is meant to be easy to use for non-programmers.

  • Related