Home > Software design >  How to deal with special characters in the variable names when calling feols regression?
How to deal with special characters in the variable names when calling feols regression?

Time:10-28

I am trying to write a function to return FE regression coefficient and standard errors since I need run a large number of regressions. The data could look like this. There are many special characters in the column names, such as space, -, &, and numbers etc.

library(data.table)
library(fixest)
library(broom)
data<-data.table(Date = c("2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01"),
         Card = c(1,2,3,4,1,2,3,4),
         A = rnorm(8),
         B = rnorm(8),
         C = rnorm(8),
         D = rnorm(8)
         )
setnames(data, old = "A", new = "A-A")
setnames(data, old = "B", new = "B B")
setnames(data, old = "C", new = "C&C")
setnames(data, old = "D", new = "1-D")

Thanks to @Ronak Shah and @Laurent Bergé, they provide two great candidate as following

estimation_fun <- function(col1,col2,df) {
  regression<-feols(as.formula(sprintf('%s ~ %s | Card   Date', col1, col2)), df)
  est =tidy(regression)$estimate
  se = tidy(regression)$std.error
  output <- list(est,se)
  return(output)
}

Or

estimation_fun <- function(lhs, rhs, df) {
regression<-feols(.[col1] ~ .[col2] | Card   Date, df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}

They both work if the column names are just "A", "B", "C", etc. However, just try this function

estimation_fun("A-A","B B",data)

Error in feols(as.formula(sprintf("%s ~ %s | Card   Date", col1, col2)), : 
Argument 'fml' could not be evaluated: <text>:1:9: unexpected symbol
1: A-A ~ B B
^

I am looking for a feols formula format that can deal with such a situation. Or any suggestions are welcome, i.e., directly removing these special characters in the column names. (But this would be a second-best)

Thanks to the great community here!

CodePudding user response:

Consider changing the special characters to _

 setnames(data, gsub("[-& ]", "_", names(data)))
 setnames(data, make.names(names(data)))

-check the data

> data
         Date Card         A_A        B_B         C_C        X1_D
1: 2020-01-01    1  0.19083908  0.4835800 -0.08755933  1.01311944
2: 2020-01-01    2 -0.57726617  0.6421043  1.12987445 -0.52168711
3: 2020-01-01    3  2.02653159 -1.4505543 -0.43367868 -0.04474157
4: 2020-01-01    4 -0.20575821  0.4691786 -1.58562690  0.49362528
5: 2020-02-01    1 -0.03461155 -0.2913712 -0.16457341 -0.07701185
6: 2020-02-01    2 -0.50734472 -0.7545768 -0.53227356  0.46468144
7: 2020-02-01    3  0.76653913 -0.1634451  1.00350319  0.25886312
8: 2020-02-01    4  0.33414436  0.6395322  1.10383819 -1.08479631

-testing

 estimation_fun('A_A', 'B_B', data)
[[1]]
[1] -0.3915516
attr(,"type")
[1] "Clustered (Card)"

[[2]]
[1] 0.2658773
attr(,"type")
[1] "Clustered (Card)"

Usually backticks work, but with feols, it is breaking. So, the safe option is to either use clean_names from janitor or with gsub to replace the special characters with _.

  • Related