I ran a fixed effect model on the following data set:
> data
# A tibble: 13,646 × 7
# Groups: age [16]
account_id time default_rate r12_gdp_bl r12_gdp_st age lifecycle
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 400293 2005 NA 0.848 0.848 17 0.00238
2 400293 2006 NA 3.81 3.81 16 NaN
3 400293 2007 NA 3.34 3.34 15 0.00694
4 400293 2008 0.0058 0.806 0.806 14 0.00897
5 400293 2009 0.0165 -5.22 -5.22 13 0.0325
6 400293 2010 0.0001 3.79 3.79 12 0.0115
7 400293 2011 0.0165 3.34 3.34 11 0.0148
8 400293 2012 0.0136 0.892 0.892 10 0.0126
9 400293 2013 0.0201 0.531 0.531 9 0.0144
10 400293 2014 NA 1.74 -0.867 8 NaN
# … with 13,636 more rows
where the credit default_rate
is the response variable, and age
and GDP
are the fixed effects. The model is the following:
> library(fixest)
> mod_bl <- feols(default_rate ~ account_id | age^r12_gdp_bl, data = data)
NOTE: 11,274 observations removed because of NA values (LHS: 11,274).
> summary(mod_bl)
OLS estimation, Dep. Var.: default_rate
Observations: 2,372
Fixed-effects: age^r12_gdp_bl: 8
Standard-errors: Clustered (age^r12_gdp_bl)
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.033706 Adj. R2: 0.031225
Within R2: 0.002527
The model seems meaningless but that's what I was asked to do. The original default_rate
is a probability, therefore it is bounded between 0 and 1.
> summary(data$default_rate)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 0.002 0.006 0.015 0.015 0.510 11274
However, fitted values of the model are not bounded between 0 and 1 as they are supposed to be.
> summary(mod_bl$fitted.values)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.04275 0.01210 0.01375 0.01497 0.01502 0.03273
How can I fix the model in order to obtain responses bounded between 0 and 1?
CodePudding user response:
The most natural (IMO) thing to do would be to fit a logistic regression instead of an OLS model. You can do this with fixest::feglm
... since your data are not binary (0/1) values this is actually a fractional logistic regression model. The example below uses binary data because that's what I have handy, but you should be able to modify your example by substituting feglm
for feols
and adding family = "quasibinomial"
...
data(Contraception, package = "mlmRev")
## 'use' is initially a factor ("N", "Y"), convert to 0/1
cc <- transform(Contraception, nUse = as.numeric(use)-1)
m1 <- feglm(nUse ~ livch age urban | district, data = cc,
family = "quasibinomial")
summary(predict(m1))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.02414 0.25616 0.38540 0.39696 0.52402 0.82794
If your supervisor insists on an OLS fit, they are asking for a linear probability model and should be willing to live with the predictions outside of [0,1] ...