I need to perform GAM on the variable "Life_expectancy" using the three variables: "Adult_Mortality", "HIV_AIDS" and "Schooling". In order to optimally tune the GAM model, I need to find the perfect combination of degrees of freedom for each variable. To do that I need to create one for loop inside another to find the optimal combination of all variabes e.g. run the following command inside 3 for loops , one for i, one of j and one for k :
gam.fit <- gam(Life_expectancy ~ s(Adult_Mortality, df = i) s(HIV_AIDS, df = j) s(Schooling, df = k), data=train)
for each combination of i,j,k and calculate the test error each time. In the end, choose the model with the lowest test error. I tried doing this with this code:
test.err <- rep(0, 8)
for (i in 3:10) {
for (j in 3:10) {
for (k in 3:10) {
gam.fit <- gam(Life_expectancy ~ s(Adult_Mortality, df = i)
s(HIV_AIDS, df = j)
s(Schooling, df = k),
data=train)
gam.pred <- predict(gam.fit, test)
test.err[i-2] <- mean((test$Life_expectancy - gam.pred)^2)
}}}
but this only yields 8 test errors for degrees of freedom i from 3 to 10. How can I output degrees of freedom for every combination of i,j,k?
Thank you in advance!
CodePudding user response:
The code can be modified to:
test.err <- array(0, c(8,8,8))
for (i in 3:10) {
for (j in 3:10) {
for (k in 3:10) {
gam.fit <- gam(Life_expectancy ~ s(Adult_Mortality, df = i)
s(HIV_AIDS, df = j)
s(Schooling, df = k),
data=train)
gam.pred <- predict(gam.fit, test)
test.err[i-2, j-2, k-2] <- mean((test$Life_expectancy - gam.pred)^2)
}}}
A couple of notes about the method:
- You haven't said which
gam
function you've used, there are functions in packagesgam
andmgcv
and probably others. The latter can estimate appropriate degrees of freedom based on the training set - You seem to be estimating degrees of freedom based on the fit to the test dataset, which to some extent goes against the idea of having a separate training and test dataset.