I am trying to write a function to return FE regression coefficient and standard errors since I need run a large number of regressions. The data could look like this. There are many special characters in the column names, such as space, -, &, and numbers etc.
library(data.table)
library(fixest)
library(broom)
data<-data.table(Date = c("2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-02-01","2020-02-01","2020-02-01","2020-02-01"),
Card = c(1,2,3,4,1,2,3,4),
A = rnorm(8),
B = rnorm(8),
C = rnorm(8),
D = rnorm(8)
)
setnames(data, old = "A", new = "A-A")
setnames(data, old = "B", new = "B B")
setnames(data, old = "C", new = "C&C")
setnames(data, old = "D", new = "1-D")
Thanks to @Ronak Shah and @Laurent Bergé, they provide two great candidate as following
estimation_fun <- function(col1,col2,df) {
regression<-feols(as.formula(sprintf('%s ~ %s | Card Date', col1, col2)), df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}
Or
estimation_fun <- function(lhs, rhs, df) {
regression<-feols(.[col1] ~ .[col2] | Card Date, df)
est =tidy(regression)$estimate
se = tidy(regression)$std.error
output <- list(est,se)
return(output)
}
They both work if the column names are just "A", "B", "C", etc. However, just try this function
estimation_fun("A-A","B B",data)
Error in feols(as.formula(sprintf("%s ~ %s | Card Date", col1, col2)), :
Argument 'fml' could not be evaluated: <text>:1:9: unexpected symbol
1: A-A ~ B B
^
I am looking for a feols formula format that can deal with such a situation. Or any suggestions are welcome, i.e., directly removing these special characters in the column names. (But this would be a second-best)
Thanks to the great community here!
CodePudding user response:
Consider changing the special characters to _
setnames(data, gsub("[-& ]", "_", names(data)))
setnames(data, make.names(names(data)))
-check the data
> data
Date Card A_A B_B C_C X1_D
1: 2020-01-01 1 0.19083908 0.4835800 -0.08755933 1.01311944
2: 2020-01-01 2 -0.57726617 0.6421043 1.12987445 -0.52168711
3: 2020-01-01 3 2.02653159 -1.4505543 -0.43367868 -0.04474157
4: 2020-01-01 4 -0.20575821 0.4691786 -1.58562690 0.49362528
5: 2020-02-01 1 -0.03461155 -0.2913712 -0.16457341 -0.07701185
6: 2020-02-01 2 -0.50734472 -0.7545768 -0.53227356 0.46468144
7: 2020-02-01 3 0.76653913 -0.1634451 1.00350319 0.25886312
8: 2020-02-01 4 0.33414436 0.6395322 1.10383819 -1.08479631
-testing
estimation_fun('A_A', 'B_B', data)
[[1]]
[1] -0.3915516
attr(,"type")
[1] "Clustered (Card)"
[[2]]
[1] 0.2658773
attr(,"type")
[1] "Clustered (Card)"
Usually backticks work, but with feols
, it is breaking. So, the safe option is to either use clean_names
from janitor
or with gsub
to replace the special characters with _
.