Merging csv files that their name starts with the same string using R-CodePudding

I have a number of csv files in the working directory. Some of these files share a string (ex. ny, nj, etc.) at the beginning of their name. Below is a screenshot:

What I want to do is to import and merge the csv files that share a string. I have searched and seen people suggesting regex, however I am not sure if that is best way to go. I appreciate any help with this.

Best, Kaveh

CodePudding user response：

Here's a function that may be more efficient than for loops, though there may be more elegant solutions.

Since I dont know what your excel files contain, I created several dummy files with a few columns ("A", "B", and "C"). I dont know what you would merge by; in this example I merged by column "A".

Given the ambiguity in the files, I have edited this to include both merge and bind approaches, depending on what is needed.

To test these functions, create a few CSV files in a folder (I created NJ_1.csv, NJ_2.csv, NJ_3.csv, NY_1.csv, NY_2.csv, each with columns A, B, and C.)

For all options, this code needs to be run.

setwd("insert path where folder with csv files is located")
library(dplyr)

OPTION 1:

If you want to merge files containing different data with a unique identifier. Example: one file contains temperature and one file contains precipitation for a given geographic location

importMerge <- function(x, mergeby){
  temp <- list.files(pattern = paste0("*",x))
  files <- lapply(temp, read.csv)
  merge <- files %>% Reduce(function(dtf1, dtf2) left_join(dtf1, dtf2, by = mergeby), .)
  return(merge)
}

NJmerge <- importMerge("NJ", "A")
NYmerge <- importMerge("NY", "A")

OPTION 2:

If you want to bind files containing the same columns. Example: Files contain both temperature and precipitation, and each file is a given geographic location. Note: All columns need to be the same name in each file

importBind <- function(x){
  temp <- list.files(pattern = paste0("*",x))
  files <- lapply(temp, read.csv)
  bind <- do.call("rbind", files)
  return(bind)
}

NJbind <- importBind("NJ")
NYbind <- importBind("NY")

OPTION 3

If you want to bind only certain columns from files containing the same column names Example: Files contain temperature and precipitation, along with other columns that aren't needed, and each file is a given geographic location. Note: All columns need to be the same name in each file. Since default is NULL, leaving keeps out will default to option 2 above.

importBindKeep <- function(x, keeps = NULL){ # default is to keep all columns
  temp <- list.files(pattern = paste0("*",x))
  files <- lapply(temp, read.csv)
  # if you wanted to only keep a few columns, use the following. 
  if(!is.null(keeps)) files <- lapply(files, "[", , keeps)
  bind <- do.call("rbind", files)
  return(bind)
}

NJbind.keeps <- importBindKeep("NJ", keeps = c("A","B")) # keep only columns A and B
NYbind.keeps <- importBindKeep("NY", keeps = c("A","B"))

See How to import multiple .csv files at once? and Simultaneously merge multiple data.frames in a list, for more information.