Home > Mobile >  Load and merge multiple CSV files based on common variables in R
Load and merge multiple CSV files based on common variables in R

Time:06-22

I have a folder which contains multiple CSV files that need to be merged together, which may take too long to do one at a time. The nature of the files are that one variable may appear in 2 or more CSV files, but not in all of them. However, the ones that don't have it may have common variables elsewhere.

Below I have provided an illustrative example showing this:-

#these files lie within the same folder on my machine
testdataframe_1<-data.frame(Column_1=c(1,2,3),
                            Column_2=c(4,5,6),
                            Column_3=c(7,8,9))
write.csv(testdataframe_1, file="testdataframe_1.csv")
testdataframe_2<-data.frame(Column_1=c(1,2,3),
                            Column_4=c(10,11,12),
                            Column_5=c(13,14,15))
write.csv(testdataframe_2, file="testdataframe_2.csv")

testdataframe_3<-data.frame(Column_6=c(16,17,18),
                            Column_7=c(19,20,21),
                            Column_4=c(10,11,12))
write.csv(testdataframe_3, file="testdataframe_3.csv")
testdataframe_4<-data.frame(Column_9=c(22,23,24),
                            Column_10=c(25,26,27),
                            Column_6=c(16,17,18))
write.csv(testdataframe_4, file="testdataframe_4.csv")


As you can see, Column_1 appears in testdataframe_1 and testdataframe_2, but not in the remaining other dataframes. However, testdataframe_2 and testdataframe_3 have Column_4 in common, and testdataframe_3 and testdataframe_4 have Column_6 in common.

Is there a way to read these CSV files into R and merge them on the common variables based on the manner above, particularly in an automated manner?

Many thanks!

CodePudding user response:

You can read all the files with lapply and then merge them with Reduce.

paste0("testdataframe_", 1:4, ".csv")  |>
  lapply(read.csv) |>
  Reduce(merge, x=_)

Note this uses the |> _ syntax from R 4.2 but you can use non-piped functions if you prefer like with

Reduce(merge, lapply(paste0("testdataframe_", 1:4, ".csv"), read.csv))
  • Related