I have a number of csv files in the working directory. Some of these files share a string (ex. ny, nj, etc.) at the beginning of their name. Below is a screenshot:
What I want to do is to import and merge the csv files that share a string. I have searched and seen people suggesting regex, however I am not sure if that is best way to go. I appreciate any help with this.
Best, Kaveh
CodePudding user response:
Here's a function that may be more efficient than for
loops, though there may be more elegant solutions.
Since I dont know what your excel files contain, I created several dummy files with a few columns ("A", "B", and "C"). I dont know what you would merge by; in this example I merged by column "A".
Given the ambiguity in the files, I have edited this to include both merge and bind approaches, depending on what is needed.
To test these functions, create a few CSV files in a folder (I created NJ_1.csv, NJ_2.csv, NJ_3.csv, NY_1.csv, NY_2.csv, each with columns A, B, and C.)
For all options, this code needs to be run.
setwd("insert path where folder with csv files is located")
library(dplyr)
OPTION 1:
If you want to merge files containing different data with a unique identifier. Example: one file contains temperature and one file contains precipitation for a given geographic location
importMerge <- function(x, mergeby){
temp <- list.files(pattern = paste0("*",x))
files <- lapply(temp, read.csv)
merge <- files %>% Reduce(function(dtf1, dtf2) left_join(dtf1, dtf2, by = mergeby), .)
return(merge)
}
NJmerge <- importMerge("NJ", "A")
NYmerge <- importMerge("NY", "A")
OPTION 2:
If you want to bind files containing the same columns. Example: Files contain both temperature and precipitation, and each file is a given geographic location. Note: All columns need to be the same name in each file
importBind <- function(x){
temp <- list.files(pattern = paste0("*",x))
files <- lapply(temp, read.csv)
bind <- do.call("rbind", files)
return(bind)
}
NJbind <- importBind("NJ")
NYbind <- importBind("NY")
OPTION 3
If you want to bind only certain columns from files containing the same column names
Example: Files contain temperature and precipitation, along with other columns that aren't needed, and each file is a given geographic location. Note: All columns need to be the same name in each file. Since default is NULL
, leaving keeps
out will default to option 2 above.
importBindKeep <- function(x, keeps = NULL){ # default is to keep all columns
temp <- list.files(pattern = paste0("*",x))
files <- lapply(temp, read.csv)
# if you wanted to only keep a few columns, use the following.
if(!is.null(keeps)) files <- lapply(files, "[", , keeps)
bind <- do.call("rbind", files)
return(bind)
}
NJbind.keeps <- importBindKeep("NJ", keeps = c("A","B")) # keep only columns A and B
NYbind.keeps <- importBindKeep("NY", keeps = c("A","B"))
See How to import multiple .csv files at once? and Simultaneously merge multiple data.frames in a list, for more information.