I have a dataset in R :
vec = c(200,300,400,500,600,100)
char1 = c("a","a","a","b","b","a")
char2 = c("c","c","d","c","d","d")
df2 = tibble(vec,char1,char2);df2
# A tibble: 6 × 3
vec char1 char2
<dbl> <chr> <chr>
1 200 a c
2 300 a c
3 400 a d
4 500 b c
5 600 b d
6 100 a d
If I want to calculate the mean value of column vector per char1 variable this can be done with :
df2%>%group_by(char1)%>%
summarise(mean(vec))
lm(df2$vec~df2$char1-1)
for the char2 variable :
df2%>%group_by(char2)%>%
summarise(mean(vec))
lm(df2$vec~df2$char2-1)
The results match the linear regression coefficients for these two cases separately.
But if I want to calculate per char1 and char2 I do in R:
df2%>%group_by(char1,char2)%>%
summarise(mean(vec))
What is the linear regression equivalent for this two variables ?
Any help ?
CodePudding user response:
Specify the interaction between char1
and char2
like char1:char2
to get:
lm(vec ~ char1:char2 0, data=df2)
#Call:
#lm(formula = vec ~ char1:char2 0, data = df2)
#
#Coefficients:
#char1a:char2c char1b:char2c char1a:char2d char1b:char2d
# 250 500 250 600
Matches the intended result:
df2 %>%
group_by(char1,char2) %>%
summarise(mean(vec))
## A tibble: 4 × 3
## Groups: char1 [2]
# char1 char2 mv
# <chr> <chr> <dbl>
#1 a c 250
#2 a d 250
#3 b c 500
#4 b d 600