I'm running a regression where NJ is the treated group and PA is the control group. However, when I run the regression PA is the treated variable. PA is actually the untreated group, and should be the baseline. How do I change this?
cardkruger = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/cardkruger.csv')
reg = lm(fte ~ t*treated, cardkruger)
summary(reg)
output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.0652 0.4986 34.224 <2e-16 ***
t 0.5075 0.7085 0.716 0.4740
treatedPA 2.8835 1.1348 2.541 0.0112 *
t:treatedPA -2.9140 1.6105 -1.809 0.0708 .
CodePudding user response:
There are a variety of ways to do this but
cardkruger$treated <- relevel(factor(cardkruger$treated), "PA")
is the easiest way to change the baseline (do this before running your regression). From ?relevel
:
The levels of a factor are re-ordered so that the level specified by ‘ref’ is first and the others are moved down.
The factor()
statement is there to convert the variable from a character vector to an unordered factor; there's no reason here for it to be ordered (the terminology of "ordered" vs "unordered" is very confusing: see e.g. labelling of ordered factor variable )
If you like tidyverse you can do
library(readr)
library(forcats)
library(dplyr)
cardkruger <- (read_csv('https://raw.githubusercontent.com/bandcar/Examples/main/cardkruger.csv')
|> mutate(across(treated, fct_relevel, "PA"))
)