Home > Mobile >  Linear regression in R, loop through csv files
Linear regression in R, loop through csv files

Time:06-20

I apologize if there is already a solution available in another question. I have 500 csv files all with sequential names "name1, name2, etc." I need to run the same, simple linear regression on each file and save the coefficient outputs. The column names are also the same for each file and there is only a singular x & y variable.

I only know how to use

lm(tablename$columnY~tablename$columnX)

on individual files. I'm not sure how to set up a loop to run through each file.

Any help is appreciated

CodePudding user response:

Here is how I would run it using a function, (you can also use a for loop). the principle is to name all the files in question and go trough them one by one, saving the results of each model as we go.

#put all your files in one folder and point to the folder path
path <- "C:/Users/xxx/Desktop"

#list all the files, with directory attached
lst <- list.files(path, full.names = T)

#make a function or loop (i like functions to get structured output)
fun <- function(i){
  
  #read each csv one at a time
  dat <- read.csv(lst[i])
  
  #make the model
  mod <- lm(dat$columnY~dat$columnX)
  
  #extract the information from the model (press view on any model and chose the desired values and hjust copy that code)
  intcpt <- mod[["coefficients"]][["(Intercept)"]]
  y <- mod[["coefficients"]][["columnX"]]
  
  #set into dataframe, with the name of the file
  out <- data.frame(lst[i], intcpt, y)
}
temp <- lapply(1:length(lst), fun) #run the model (will take the last thing stated in the fuction and make a list elemnt for each "loop")
results <- do.call("rbind",temp) #from list to dataframe
  • Related