Home > other >  How to pull out only one beta coefficient for each group separately from the regression equation and
How to pull out only one beta coefficient for each group separately from the regression equation and

Time:12-03

here little example of my data.

sales.data=structure(list(MDM_Key = c(370L, 370L, 370L, 370L, 370L, 370L, 
370L, 371L, 371L, 371L, 371L, 371L, 371L, 371L), sale_count = c(30L, 
32L, 32L, 24L, 20L, 15L, 23L, 30L, 32L, 32L, 24L, 20L, 15L, 23L
), iek_disc_price = c(38227.08, 38227.08, 33739.7, 38227.08, 
38227.08, 28844.16, 31649.255, 38227.08, 38227.08, 33739.7, 38227.08, 
38227.08, 28844.16, 31649.255)), class = "data.frame", row.names = c(NA, 
-14L))

i perform regression analysis

str(sales.data)
m1<-lm(formula=sale_count~iek_disc_price,data=sales.data)
summary(m1)

But the main difficulty is that for each group (MDM_Key) I don't need all the regression results from the summary, but only one beta coefficient.

here

B=0.0008559.

but then i need calculate mean value for sale_count and the mean for iek_disc_price (also for each mdm key group)

so the desired result would be like this

MDM_Key    beta    mean(sale_count) mean(iek_disc_price)
370    0.0008559         25.14                 35305
371    0.0008559         25.14                 35305

How to take only beta (nor intercept)regression coefficient for each group mdm_key and also for each group, calculate the mean values for sale_count and iek_disc_price to get the summary table indicated above.

Thank you for your help.

CodePudding user response:

If I understood correctly, you want to apply one regression per MDM_Key.

library(dplyr)
library(purrr)
library(broom)

sales.data %>% 
  group_by(MDM_Key) %>% 
  mutate(
    mean_sale_count = mean(sale_count),
    mean_iek_disc_price = mean(iek_disc_price)
  ) %>% 
  nest(-MDM_Key,-mean_sale_count,-mean_iek_disc_price) %>% 
  mutate(
    coefs = map(.x = data,.f = ~tidy(lm(formula=.$sale_count~.$iek_disc_price,data=.)))
    ) %>%
  unnest(coefs) %>% 
  filter(term != "(Intercept)") %>% 
  select(MDM_Key,beta = estimate,mean_sale_count,mean_iek_disc_price)


# A tibble: 2 x 4
# Groups:   MDM_Key [2]
  MDM_Key     beta mean_sale_count mean_iek_disc_price
    <int>    <dbl>           <dbl>               <dbl>
1     370 0.000856            25.1              35306.
2     371 0.000856            25.1              35306.

CodePudding user response:

Using R base and the split apply combine strategy:

do.call(rbind, lapply(split(sales.data, sales.data$MDM_Key), function(i) {
    c(beta=coef(lm(sale_count~iek_disc_price, data=i))[2],
      sale_count_mean=mean(i$sale_count), 
      iek_disc_price_mean=mean(i$iek_disc_price))
} ))

    beta.iek_disc_price sale_count_mean iek_disc_price_mean
370        0.0008558854        25.14286            35305.92
371        0.0008558854        25.14286            35305.92

CodePudding user response:

Get the means using aggregate and the beta values using lmList and then put them together and rearrange the columns in the order shown in the question. Omit [, c(2:1, 3:4)] if the column order doesn't matter. Note that nlme comes with R and does not have to be installed.

library(nlme) # lmList

means <- aggregate(. ~ MDM_Key, sales.data, mean)
fm <- lmList(sale_count ~ iek_disc_price | MDM_Key, sales.data)
cbind(beta = coef(fm)[, 2], means)[, c(2:1, 3:4)]

##   MDM_Key         beta sale_count iek_disc_price
## 1     370 0.0008558854   25.14286       35305.92
## 2     371 0.0008558854   25.14286       35305.92
  •  Tags:  
  • r
  • Related