Home > Blockchain >  How can I match the result of linear regression in R to be the same output as group_by dplyr?
How can I match the result of linear regression in R to be the same output as group_by dplyr?

Time:01-27

I have a dataset in R :

vec = c(200,300,400,500,600,100)
char1 = c("a","a","a","b","b","a")
char2 = c("c","c","d","c","d","d")
df2 = tibble(vec,char1,char2);df2

# A tibble: 6 × 3
    vec char1 char2
  <dbl> <chr> <chr>
1   200 a     c    
2   300 a     c    
3   400 a     d    
4   500 b     c    
5   600 b     d    
6   100 a     d    

If I want to calculate the mean value of column vector per char1 variable this can be done with :

df2%>%group_by(char1)%>%
  summarise(mean(vec))
lm(df2$vec~df2$char1-1)

for the char2 variable :

df2%>%group_by(char2)%>%
  summarise(mean(vec))
lm(df2$vec~df2$char2-1)

The results match the linear regression coefficients for these two cases separately.

But if I want to calculate per char1 and char2 I do in R:

df2%>%group_by(char1,char2)%>%
  summarise(mean(vec))

What is the linear regression equivalent for this two variables ?

Any help ?

CodePudding user response:

Specify the interaction between char1 and char2 like char1:char2 to get:

lm(vec ~ char1:char2   0, data=df2)

#Call:
#lm(formula = vec ~ char1:char2   0, data = df2)
#
#Coefficients:
#char1a:char2c  char1b:char2c  char1a:char2d  char1b:char2d  
#          250            500            250            600  

Matches the intended result:

df2 %>% 
  group_by(char1,char2) %>%
  summarise(mean(vec))

## A tibble: 4 × 3
## Groups:   char1 [2]
#  char1 char2    mv
#  <chr> <chr> <dbl>
#1 a     c       250
#2 a     d       250
#3 b     c       500
#4 b     d       600
  • Related