I have an enormous data set. I want to sample it K times, run a linear regression and extract the RMSE each time to store in a data frame.
pseudo code:
rmse <- emptyDataFrame{}
for (i in 1:100)
sample_n(df, n, replace=True)
model <- lm(y ~ ., data = df)
rmse <- sqrt(mean(y_pred - y)^2))
Can anyone give me the missing details?
CodePudding user response:
You can add a [i]
after rmse
to store the values in different indexes. Also, i don't know how your function sample_n
works, but perhaps you need to save its output in a new variable to pass it to lm
.
Also, the formula for RMSE is sqrt(mean((y_pred - y)^2))
.
rmse <- c()
for (i in 1:100){
df_sampled <- sample_n(df, n, replace=True)
model <- lm(y ~ ., data = df_sampled)
rmse[i] <- sqrt(mean((y_pred - y)^2))
}
as.data.frame(rmse)