Home > Blockchain >  Combine many regression coefficients into one dataframe
Combine many regression coefficients into one dataframe

Time:02-16

I have multiple regression lines. I want to combine the coefficients into one dataframe for easy visualization.

However, not all regressions have the same coefficients, so I was not able to use a for loop looking for the coefficient name.

Here is an example with same sample data and the desired output.

df=structure(list(x1 = c(-0.689814979498939, -0.509885025360363, 
-0.20248689168896, -1.79535329549682, 1.60447678701814, -0.408696703105769, 
0.97243696942363, -0.688339413750959, -0.359380427396309, 1.11638856659614
), x2 = c(0.775426469430265, 0.367906637531888, 0.965721516497862, 
-0.601113535090469, -0.655567870650469, 1.45494263752806, 0.187276141272287, 
-0.659949502938592, -0.481763339717836, -0.581132345668067), 
    x3 = c(-0.17202393327554, 0.022376822081548, -1.05069599269781, 
    -0.631926480864125, 1.76178640615702, -1.60488439781703, 
    0.172936842119056, 0.750091896988, -1.60900096983098, 0.443223570706679
    ), x4 = c(-0.117822668731567, -0.645150368596604, -1.58642572549226, 
    0.3630617077837, -1.00866095836508, 0.696818785571135, 0.978471598076335, 
    -0.315392158997475, 1.37594860146428, 0.0574562910914235), 
    y = c(-1.07067139899979, -0.360297366336307, 0.0328023505398295, 
    1.07908579247402, 0.185603676169661, 0.384858869675533, 0.62179479088495, 
    1.44265090318836, 0.340526158232088, -1.20387054108186)), class = "data.frame", row.names = c(NA, 
-10L))

model1=lm(y~x1, data=df)
model2=lm(y~x2, data=df)
model3=lm(y~x2 x4, data=df)
model4=lm(y~x2 x3 x4, data=df)

coefs_x1=c(-0.2749230,NA,NA,NA)
coefs_x2=c(NA,-0.2795309,-0.2599686,-0.40977455)
coefs_x3=c(NA,NA,NA,-0.18740855)
coefs_x4=c(NA,NA,0.1568399,0.04981574)

output_df=data.frame(coefs_x1,coefs_x2,coefs_x3,coefs_x4)
> output_df
   coefs_x1   coefs_x2   coefs_x3   coefs_x4
1 -0.274923         NA         NA         NA
2        NA -0.2795309         NA         NA
3        NA -0.2599686         NA 0.15683990
4        NA -0.4097746 -0.1874086 0.04981574

CodePudding user response:

You could do:

library(tidyverse)

forms <- list(x1 = y~ x1, x2 = y ~ x2, x3 = y ~ x2   x4, x4 = y ~ x2   x3   x4)

map(forms, ~t(coef(lm(.x, data = df)))) %>%
  plyr::rbind.fill.matrix() %>%
  as.data.frame() %>%
  select(-1)

         x1         x2         x4         x3
1 -0.274923         NA         NA         NA
2        NA -0.2795309         NA         NA
3        NA -0.2599686 0.15683990         NA
4        NA -0.4097745 0.04981574 -0.1874086

CodePudding user response:

There are many ways to do that, here is what I would typically do using dplyr.

You can call directly each of the coefficients. They are "inside" the objects named "model." Once you call model1$coeffcients it will return you the coefficients, including the intercept. Since you don't want the intercept (at least you didn't mention it in your question), I'm removing it using baseR with the [-1] argument, that removes the first column.

Then I'm putting all the lines together with bind_rows() and organize the presentation with select(). The function bind_rows() will merge each row, and add the new columns also adding NA, for missings. Which solves your problem.

Solution

library(dplyr)

bind_rows(model1$coefficients[-1],
          model2$coefficients[-1],
          model3$coefficients[-1],
          model4$coefficients[-1]) %>% 
  select(x1, x2, x3, x4)

Output

# A tibble: 4 x 4
      x1     x2     x3      x4
   <dbl>  <dbl>  <dbl>   <dbl>
1 -0.275 NA     NA     NA     
2 NA     -0.280 NA     NA     
3 NA     -0.260 NA      0.157 
4 NA     -0.410 -0.187  0.0498

FYI, the output is the same as yours, but tibbles usually round it for presentation, but in the background, it has all the decimal places.

CodePudding user response:

Using base R (list created simply to make it easier to make names, original idea was to rbind with do.call)

# assumes coefs will be named coefs_x 
coefs <- ls(pattern="coefs_x*")
as.data.frame(coefs, col.names=paste0("coefs_x",1:length(coefs )))

   coefs_x1   coefs_x2   coefs_x3   coefs_x4
1 -0.274923         NA         NA         NA
2        NA -0.2795309         NA         NA
3        NA -0.2599686         NA 0.15683990
4        NA -0.4097746 -0.1874086 0.04981574
  • Related