how to change the baseline with non-factor variables in a regression in R?-CodePudding

I'm running a regression where NJ is the treated group and PA is the control group. However, when I run the regression PA is the treated variable. PA is actually the untreated group, and should be the baseline. How do I change this?

cardkruger = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/cardkruger.csv')
reg = lm(fte ~ t*treated, cardkruger)
summary(reg)

output:

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  17.0652     0.4986  34.224   <2e-16 ***
t             0.5075     0.7085   0.716   0.4740    
treatedPA     2.8835     1.1348   2.541   0.0112 *  
t:treatedPA  -2.9140     1.6105  -1.809   0.0708 .

CodePudding user response：

There are a variety of ways to do this but

cardkruger$treated <- relevel(factor(cardkruger$treated), "PA")

is the easiest way to change the baseline (do this before running your regression). From ?relevel:

The levels of a factor are re-ordered so that the level specified by ‘ref’ is first and the others are moved down.

The factor() statement is there to convert the variable from a character vector to an unordered factor; there's no reason here for it to be ordered (the terminology of "ordered" vs "unordered" is very confusing: see e.g. labelling of ordered factor variable )

If you like tidyverse you can do

library(readr)
library(forcats)
library(dplyr)
cardkruger <- (read_csv('https://raw.githubusercontent.com/bandcar/Examples/main/cardkruger.csv')
   |> mutate(across(treated, fct_relevel, "PA"))
)