I am running R via Python using %load_ext rpy2.ipython.
#Prepare R in Python
%load_ext rpy2.ipython
%R install.packages("pacman")
%R pacman::p_load(pacman, tidyverse, lfe)
In R, I am running a model using felm on a dataframe. I want to capture the residuals in the original dataframe and then pass that dataframe back into Python. I can do this for regular lm as:
%%R -o test
dat <- mtcars
test <- mutate(res=residuals(lm(mpg~disp, data = dat)), dat)
Then in Python I have a nice dataframe with res included:
test.describe()
Unfortunately, this approach doesn't seem to work when I use felm instead of lm:
%%R -o test2
dat <- mtcars
test2 <- mutate(res=residuals(felm(mpg~disp, data = dat)), dat)
I get the following: "ValueError: Data must be 1-dimensional" Any ideas why this works for lm and not for felm? I did notice that when I run:
%%R
dat <- mtcars
test2 <- mutate(res=residuals(felm(mpg~disp, data = dat)), dat)
head(test2, 10)
the residual is calculated correctly and appears to be added to the dataframe but is named "mpg" instead of "res." Not sure what is going on here. Any help is appreciated.
CodePudding user response:
The issue is that with lm
the residuals
return a named vector
, whereas with felm
, it is returning a matrix
with single column
str(residuals(felm(mpg~disp, data = dat)) )
num [1:32, 1] -2.01 -2.01 -2.35 2.43 3.94 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "mpg"
Considering removing the dimensions by converting to vector
with as.vector
or wrap with c
c(residuals(felm(mpg~disp, data = dat)))
With mutate
, it would be
library(dplyr)
test <- dat %>%
mutate(res = c(residuals(felm(mpg~disp, data = .))))
-output
> str(test)
'data.frame': 32 obs. of 12 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
$ res : num -2.01 -2.01 -2.35 2.43 3.94 ...
With the OP's approach, the column remains as a matrix
and this may have problems in the describe
method from python
> test <- dat %>%
mutate(res = residuals(felm(mpg~disp, data = .)))
>
> str(test)
'data.frame': 32 obs. of 12 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
$ res : num [1:32, 1] -2.01 -2.01 -2.35 2.43 3.94 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mpg"