Home > Back-end >  capturing the residual from felm in original dataframe and passing to Python
capturing the residual from felm in original dataframe and passing to Python

Time:02-11

I am running R via Python using %load_ext rpy2.ipython.

#Prepare R in Python
%load_ext rpy2.ipython
%R install.packages("pacman")
%R pacman::p_load(pacman, tidyverse, lfe)

In R, I am running a model using felm on a dataframe. I want to capture the residuals in the original dataframe and then pass that dataframe back into Python. I can do this for regular lm as:

%%R -o test
dat <- mtcars
test <- mutate(res=residuals(lm(mpg~disp, data = dat)), dat)

Then in Python I have a nice dataframe with res included:

test.describe()

Unfortunately, this approach doesn't seem to work when I use felm instead of lm:

%%R -o test2
dat <- mtcars
test2 <- mutate(res=residuals(felm(mpg~disp, data = dat)), dat)

I get the following: "ValueError: Data must be 1-dimensional" Any ideas why this works for lm and not for felm? I did notice that when I run:

%%R
dat <- mtcars
test2 <- mutate(res=residuals(felm(mpg~disp, data = dat)), dat)
head(test2, 10)

the residual is calculated correctly and appears to be added to the dataframe but is named "mpg" instead of "res." Not sure what is going on here. Any help is appreciated.

CodePudding user response:

The issue is that with lm the residuals return a named vector, whereas with felm, it is returning a matrix with single column

str(residuals(felm(mpg~disp, data = dat)) )
 num [1:32, 1] -2.01 -2.01 -2.35 2.43 3.94 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr "mpg"

Considering removing the dimensions by converting to vector with as.vector or wrap with c

c(residuals(felm(mpg~disp, data = dat)))

With mutate, it would be

library(dplyr)
test <-  dat %>%
    mutate(res = c(residuals(felm(mpg~disp, data = .))))

-output

> str(test)
'data.frame':   32 obs. of  12 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
 $ res : num  -2.01 -2.01 -2.35 2.43 3.94 ...

With the OP's approach, the column remains as a matrix and this may have problems in the describe method from python

> test <-  dat %>%
      mutate(res = residuals(felm(mpg~disp, data = .)))
> 
> str(test)
'data.frame':   32 obs. of  12 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
 $ res : num [1:32, 1] -2.01 -2.01 -2.35 2.43 3.94 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr "mpg"
  • Related