I have trouble looping through a regression model dropping one observation each time to estimate the effect of influential observations.
I would like to run the model several times, each time dropping the ith observation and extracting the relevant coefficient estimate and store it in a vector. I think this could quite easily be done with a fairly straight forward loop, however, I'm stuck at the specifics.
I want to be left with a vector containing n coefficient estimates from n iterations of the same model. Any help would be beneficial!
Below I provide some dummy data and example code.
#Dummy data:
set.seed(489)
patientn <- rep(1:400)
gender <- rbinom(400, 1, 0.5)
productid <- rep(c("Product A","Product B"), times=200)
country <- rep(c("USA","UK","Canada","Mexico"), each=50)
baselarea <- rnorm(400,400,60) #baseline area
baselarea2 <- rnorm(400,400,65) #baseline area2
sfactor <- c(
rep(c(0.3,0.9), times = 25),
rep(c(0.4,0.5), times = 25),
rep(c(0.2,0.4), times = 25),
rep(c(0.3,0.7), times = 25)
)
rashdummy2a <- data.frame(patientn,gender,productid,country,baselarea,baselarea2,sfactor)
Data <- rashdummy2a %>% mutate(rashleft = baselarea2*sfactor/baselarea*100) ```
## Example of how this can be done manually:
# model
m1<-lm(rashleft ~ gender baselarea sfactor, data = data)
# extracting relevant coefficient estimates, each time dropping a different "patient" ("patientn")
betas <- c(lm(rashleft ~ gender baselarea sfactor, data = rashdummy2b, patientn !=1)$coefficients[2],
lm(rashleft ~ gender baselarea sfactor, data = rashdummy2b, patientn !=2)$coefficients[2],
lm(rashleft ~ gender baselarea sfactor, data = rashdummy2b, patientn !=3)$coefficients[2])
# the betas vector now stores the relevant coefficient estimates (coefficient nr 2, for gender) for three different variations of the model.
CodePudding user response:
We can use a for loop. In your question you use an object rashdummy2b
which is not defined. Now I used data
but you can replace that by an object of choice.
#create list to bind results to
result <- list()
#loop through patients and extract betas
for(i in unique(data$patientn)){
#construct linear model
lm.model <- lm(rashleft ~ gender baselarea sfactor, data = subset(data, data$patientn != i))
#create data.frame containing patient left out and coefficient
result.dt <- data.frame(beta = lm.model$coefficients[[2]],
patient_left_out = i)
#bind to list
result[[i]] <- result.dt
}
#bind to data.frame
result <- do.call(rbind, result)
Result
head(result)
beta patient_left_out
1 1.381248 1
2 1.345188 2
3 1.427784 3
4 1.361674 4
5 1.420417 5
6 1.454196 6
CodePudding user response:
You can drop a particular row (or column) by using a negative index. In your case, you proceed as follows:
betas <- numeric(nrow(rashdummy2b)) # memory preallocation
for (i in 1:nrow(rashdummy2b)) {
betas[i] <- lm(rashleft ~ gender baselarea sfactor, data=rashdummy2b[-i,])$coefficients[2]
}