R: how to call feols regression within a function-CodePudding

I am trying to write a function to return regression coefficient and standard errors since I need run a large number of regressions. The data could look like this

library(tidyverse)
library(fixest)
library(broom)
data<-tibble(Date = c("2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01"),
         Card = c(1,2,3,4,1,2,3,4),
         A = rnorm(8),
         B = rnorm(8),
         C = rnorm(8)
         )

My current code is as following

estimation_fun <- function(col1,col2,df) {

  regression<-feols(df[[col1]] ~ df[[col2]] | Card   Date, df)

  est =tidy(regression)$estimate
  se = tidy(regression)$std.error

  output <- list(est,se)
  return(output)

}
estimation_fun("A","B",example)

However, it does not work. I guess it is related to column name in feols because I can make it work for lm().

CodePudding user response：

feols function needs a formula object. You can create it using paste0/sprintf.

estimation_fun <- function(col1,col2,df) {
  
  regression<-feols(as.formula(sprintf('%s ~ %s | Card   Date', col1, col2)), df)
  
  est =tidy(regression)$estimate
  se = tidy(regression)$std.error
  
  output <- list(est,se)
  return(output)
  
}

estimation_fun("A","B",data)

#[[1]]
#[1] -0.1173276
#attr(,"type")
#[1] "Clustered (Card)"

#[[2]]
#[1] 1.083011
#attr(,"type")
#[1] "Clustered (Card)"

To apply this for every pair of variables you may do -

cols <- names(data)[-(1:2)]

do.call(rbind, combn(cols, 2, function(x) {
  data.frame(cols = paste0(x, collapse = '-'), 
             t(estimation_fun(x[1],x[2],data)))
}, simplify = FALSE))

   cols         X1        X2
#1  A-B -0.1173276  1.083011
#2  A-C -0.1117691 0.5648162
#3  B-C -0.3771884 0.1656587

CodePudding user response：

Ronak's right: only formulas made of variable names can be used.

Since fixest 0.10.0, you can use the dot square bracket operator to do just that. See the help page for formula manipulation in xpd.

Just change one line in your code to make it work:

estimation_fun <- function(lhs, rhs, df) {
  # lhs must be of length 1 (otherwise => not what you'd want)
  # rhs can be a vector of variables
  regression <- feols(.[lhs] ~ .[rhs] | Card   Date, df)
  
  # etc...
}

# Example of how ".[]" works: 

lhs = "A"
rhs = c("B", "C")
feols(.[lhs] ~ .[rhs], data)
#> OLS estimation, Dep. Var.: A
#> Observations: 8 
#> Standard-errors: IID 
#>              Estimate Std. Error   t value Pr(>|t|) 
#> (Intercept)  0.375548   0.428293  0.876849  0.42069 
#> B           -0.670476   0.394592 -1.699164  0.15004 
#> C            0.177647   0.537452  0.330536  0.75440 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.737925   Adj. R2: 0.183702

By the way, I recommend to use the built-in multiple estimation facility (see help here) since estimation speed will be substantially improved.