Load and merge multiple CSV files based on common variables in R-CodePudding

I have a folder which contains multiple CSV files that need to be merged together, which may take too long to do one at a time. The nature of the files are that one variable may appear in 2 or more CSV files, but not in all of them. However, the ones that don't have it may have common variables elsewhere.

Below I have provided an illustrative example showing this:-

#these files lie within the same folder on my machine
testdataframe_1<-data.frame(Column_1=c(1,2,3),
                            Column_2=c(4,5,6),
                            Column_3=c(7,8,9))
write.csv(testdataframe_1, file="testdataframe_1.csv")
testdataframe_2<-data.frame(Column_1=c(1,2,3),
                            Column_4=c(10,11,12),
                            Column_5=c(13,14,15))
write.csv(testdataframe_2, file="testdataframe_2.csv")

testdataframe_3<-data.frame(Column_6=c(16,17,18),
                            Column_7=c(19,20,21),
                            Column_4=c(10,11,12))
write.csv(testdataframe_3, file="testdataframe_3.csv")
testdataframe_4<-data.frame(Column_9=c(22,23,24),
                            Column_10=c(25,26,27),
                            Column_6=c(16,17,18))
write.csv(testdataframe_4, file="testdataframe_4.csv")

As you can see, Column_1 appears in testdataframe_1 and testdataframe_2, but not in the remaining other dataframes. However, testdataframe_2 and testdataframe_3 have Column_4 in common, and testdataframe_3 and testdataframe_4 have Column_6 in common.

Is there a way to read these CSV files into R and merge them on the common variables based on the manner above, particularly in an automated manner?

Many thanks!

CodePudding user response：

You can read all the files with lapply and then merge them with Reduce.

paste0("testdataframe_", 1:4, ".csv")  |>
  lapply(read.csv) |>
  Reduce(merge, x=_)

Note this uses the |> _ syntax from R 4.2 but you can use non-piped functions if you prefer like with

Reduce(merge, lapply(paste0("testdataframe_", 1:4, ".csv"), read.csv))