I am trying to write a function to return regression coefficient and standard errors since I need run a large number of regressions. The data could look like this
library(tidyverse)
library(fixest)
library(broom)
data<-tibble(Date = c("2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01"),
Card = c(1,2,3,4,1,2,3,4),
A = rnorm(8),
B = rnorm(8),
C = rnorm(8)
)
My current code is as following
estimation_fun <- function(col1,col2,df) {
regression<-feols(df[[col1]] ~ df[[col2]] | Card Date, df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}
estimation_fun("A","B",example)
However, it does not work. I guess it is related to column name in feols because I can make it work for lm()
.
CodePudding user response:
feols
function needs a formula object. You can create it using paste0
/sprintf
.
estimation_fun <- function(col1,col2,df) {
regression<-feols(as.formula(sprintf('%s ~ %s | Card Date', col1, col2)), df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}
estimation_fun("A","B",data)
#[[1]]
#[1] -0.1173276
#attr(,"type")
#[1] "Clustered (Card)"
#[[2]]
#[1] 1.083011
#attr(,"type")
#[1] "Clustered (Card)"
To apply this for every pair of variables you may do -
cols <- names(data)[-(1:2)]
do.call(rbind, combn(cols, 2, function(x) {
data.frame(cols = paste0(x, collapse = '-'),
t(estimation_fun(x[1],x[2],data)))
}, simplify = FALSE))
cols X1 X2
#1 A-B -0.1173276 1.083011
#2 A-C -0.1117691 0.5648162
#3 B-C -0.3771884 0.1656587
CodePudding user response:
Ronak's right: only formulas made of variable names can be used.
Since fixest
0.10.0, you can use the dot square bracket operator to do just that. See the help page for formula manipulation in xpd.
Just change one line in your code to make it work:
estimation_fun <- function(lhs, rhs, df) {
# lhs must be of length 1 (otherwise => not what you'd want)
# rhs can be a vector of variables
regression <- feols(.[lhs] ~ .[rhs] | Card Date, df)
# etc...
}
# Example of how ".[]" works:
lhs = "A"
rhs = c("B", "C")
feols(.[lhs] ~ .[rhs], data)
#> OLS estimation, Dep. Var.: A
#> Observations: 8
#> Standard-errors: IID
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.375548 0.428293 0.876849 0.42069
#> B -0.670476 0.394592 -1.699164 0.15004
#> C 0.177647 0.537452 0.330536 0.75440
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.737925 Adj. R2: 0.183702
By the way, I recommend to use the built-in multiple estimation facility (see help here) since estimation speed will be substantially improved.