I have a folder which contains multiple CSV files that need to be merged together, which may take too long to do one at a time. The nature of the files are that one variable may appear in 2 or more CSV files, but not in all of them. However, the ones that don't have it may have common variables elsewhere.
Below I have provided an illustrative example showing this:-
#these files lie within the same folder on my machine
testdataframe_1<-data.frame(Column_1=c(1,2,3),
Column_2=c(4,5,6),
Column_3=c(7,8,9))
write.csv(testdataframe_1, file="testdataframe_1.csv")
testdataframe_2<-data.frame(Column_1=c(1,2,3),
Column_4=c(10,11,12),
Column_5=c(13,14,15))
write.csv(testdataframe_2, file="testdataframe_2.csv")
testdataframe_3<-data.frame(Column_6=c(16,17,18),
Column_7=c(19,20,21),
Column_4=c(10,11,12))
write.csv(testdataframe_3, file="testdataframe_3.csv")
testdataframe_4<-data.frame(Column_9=c(22,23,24),
Column_10=c(25,26,27),
Column_6=c(16,17,18))
write.csv(testdataframe_4, file="testdataframe_4.csv")
As you can see, Column_1 appears in testdataframe_1 and testdataframe_2, but not in the remaining other dataframes. However, testdataframe_2 and testdataframe_3 have Column_4 in common, and testdataframe_3 and testdataframe_4 have Column_6 in common.
Is there a way to read these CSV files into R and merge them on the common variables based on the manner above, particularly in an automated manner?
Many thanks!
CodePudding user response:
You can read all the files with lapply
and then merge them with Reduce
.
paste0("testdataframe_", 1:4, ".csv") |>
lapply(read.csv) |>
Reduce(merge, x=_)
Note this uses the |>
_
syntax from R 4.2 but you can use non-piped functions if you prefer like with
Reduce(merge, lapply(paste0("testdataframe_", 1:4, ".csv"), read.csv))