Home > Back-end >  R: How to loop through models dropping one observation at a time?
R: How to loop through models dropping one observation at a time?

Time:06-03

I have trouble looping through a regression model dropping one observation each time to estimate the effect of influential observations.

I would like to run the model several times, each time dropping the ith observation and extracting the relevant coefficient estimate and store it in a vector. I think this could quite easily be done with a fairly straight forward loop, however, I'm stuck at the specifics.

I want to be left with a vector containing n coefficient estimates from n iterations of the same model. Any help would be beneficial!

Below I provide some dummy data and example code.

#Dummy data:

set.seed(489)

patientn <- rep(1:400)

gender <- rbinom(400, 1, 0.5)

productid <- rep(c("Product A","Product B"), times=200)

country <- rep(c("USA","UK","Canada","Mexico"), each=50)

baselarea <- rnorm(400,400,60) #baseline area
baselarea2 <- rnorm(400,400,65) #baseline area2

sfactor  <- c(
  rep(c(0.3,0.9), times = 25),
  rep(c(0.4,0.5), times = 25),
  rep(c(0.2,0.4), times = 25),
  rep(c(0.3,0.7), times = 25)
)

rashdummy2a <- data.frame(patientn,gender,productid,country,baselarea,baselarea2,sfactor)

Data <- rashdummy2a %>% mutate(rashleft = baselarea2*sfactor/baselarea*100) ```


## Example of how this can be done manually: 

# model
m1<-lm(rashleft ~ gender   baselarea   sfactor, data = data)

# extracting relevant coefficient estimates, each time dropping a different "patient" ("patientn")

betas <- c(lm(rashleft ~ gender   baselarea   sfactor, data = rashdummy2b, patientn !=1)$coefficients[2],
           lm(rashleft ~ gender   baselarea   sfactor, data = rashdummy2b, patientn !=2)$coefficients[2],
           lm(rashleft ~ gender   baselarea   sfactor, data = rashdummy2b, patientn !=3)$coefficients[2])

# the betas vector now stores the relevant coefficient estimates (coefficient nr 2, for gender) for three different variations of the model.  

CodePudding user response:

We can use a for loop. In your question you use an object rashdummy2b which is not defined. Now I used data but you can replace that by an object of choice.

#create list to bind results to
result <- list()

#loop through patients and extract betas
for(i in unique(data$patientn)){

  #construct linear model
  lm.model <- lm(rashleft ~ gender   baselarea   sfactor, data = subset(data, data$patientn != i))
  
  #create data.frame containing patient left out and coefficient
  result.dt <- data.frame(beta = lm.model$coefficients[[2]],
                          patient_left_out = i)
  
  #bind to list
  result[[i]] <- result.dt
}

#bind to data.frame
result <- do.call(rbind, result)

Result

head(result) 
      beta patient_left_out
1 1.381248                1
2 1.345188                2
3 1.427784                3
4 1.361674                4
5 1.420417                5
6 1.454196                6

CodePudding user response:

You can drop a particular row (or column) by using a negative index. In your case, you proceed as follows:

betas <- numeric(nrow(rashdummy2b))  # memory preallocation
for (i in 1:nrow(rashdummy2b)) {
  betas[i] <- lm(rashleft ~ gender   baselarea   sfactor, data=rashdummy2b[-i,])$coefficients[2]
}
  • Related