I can successfully rebuild a linear regression in R using optim, but I get a wrong result when I also use a weight.
df <- mtcars
df$weight <- df$disp / mean(df$disp) #just an example
Here are the correct results using lm:
summary(lm(mpg ~ cyl, data = df))$r.squared
[1] 0.72618
summary(lm(mpg ~ cyl, data = df, weights = weight))$r.squared
[1] 0.6794141
Here the function and optim for the unweighted version that works:
minX.RSQ <- function(par) {
esti <- par[1] par[2]*df[,'cyl']
resi <- (esti - df[,'mpg'])**2
RSS <- sum(resi)
SST <- sum((df[,'mpg'] - mean(df[,'mpg']))**2)
-(1 - RSS / SST)
}
optim(par = c(0,0), minX.RSQ)$value
[1] -0.72618 #which is correct, cp. above
Here the function and optim for the weighted version, I tried weighted sums, but the result is NOT correct:
minW.RSQ <- function(par) {
esti <- par[1] par[2]*df[,'cyl']
resi <- (esti - df[,'mpg'])**2
RSS <- sum(resi * df[,'weight'])
SST <- sum((df[,'mpg'] - mean(df[,'mpg']))**2 * df[,'weight'])
-(1 - RSS / SST)
}
optim(par = c(0,0), minW.RSQ)$value
[1] -0.7521823 #which is NOT correct, cp. above
How can I fix the function for the weighted version?
CodePudding user response:
You are really close. You just need to use a weighted mean in your SST calc:
minW.RSQ <- function(par) {
esti <- par[1] par[2]*df[,'cyl']
resi <- (esti - df[,'mpg'])**2
RSS <- sum(resi * df[,'weight'])
SST <- sum((df[,'mpg'] - weighted.mean(df[,'mpg'], w = df[,'weight']))**2 * df[,'weight'])
-(1 - RSS / SST)
}
optim(par = c(0,0), minW.RSQ)$value
# [1] -0.6794141