Home > Enterprise >  Row bind then resave particular CSV files in a directory based on matching index column values in R
Row bind then resave particular CSV files in a directory based on matching index column values in R

Time:07-29

I am working with a large number of CSV files housed in the same directory. This time series data comes from sensors deployed at various locations over multiple time periods. Each file has a siteID column with name of the location. I want to iteratively row bind (in chronological order) files that come from the same location, then resave them such that there is only one file per location, rather than multiple files with the same siteID covering different time periods. Here is a (hopefully) reproducible set up:

# Reproducible set up:
library(tidyverse)

# making fake folder where all these CSVs are contained 
dir.create("test_dir_1")

# create first of two datetime columns covering different time periods
datetime <- as.POSIXct(c("2022-07-15 15:25:00", "2022-07-16 15:30:00", "2022-07-17 15:35:00"))

# making fake column for data from the site
temperature <- 1:3

# making data frames for first time period and adding siteID column
sensor_SFM01_2_20220715_20220717 <- data.frame(datetime, temperature) %>% 
  mutate(siteID = "SFM01_2")
sensor_04M06_1_20220715_20220717 <- data.frame(datetime, temperature) %>% 
  mutate(siteID = "04M06_1")
sensor_20M04_2_20220715_20220717 <- data.frame(datetime, temperature) %>% 
  mutate(siteID = "20M04_2")

# now reassigning datetime to make data frames from the second time period 
datetime<- as.POSIXct(c("2022-07-17 15:45:00", "2022-07-18 15:50:00", "2022-07-19 15:55:00"))

# making data frames for first time period and adding siteID column
sensor_SFM01_2_20220717_20220719 <- data.frame(datetime, temperature) %>% 
  mutate(siteID = "SFM01_2")
sensor_04M06_1_20220717_20220719 <- data.frame(datetime, temperature) %>% 
  mutate(siteID = "04M06_1")
sensor_20M04_2_20220717_20220719 <- data.frame(datetime, temperature) %>% 
  mutate(siteID = "20M04_2")

# saving files to directory 
write_csv(sensor_SFM01_2_20220715_20220717, "test_dir_1/sensor_SFM01_2_20220715_20220717.csv")
write_csv(sensor_SFM01_2_20220717_20220719, "test_dir_1/sensor_SFM01_2_20220717_20220719.csv")
write_csv(sensor_04M06_1_20220715_20220717, "test_dir_1/sensor_04M06_1_20220715_20220717.csv")
write_csv(sensor_04M06_1_20220717_20220719, "test_dir_1/sensor_04M06_1_20220717_20220719.csv")
write_csv(sensor_20M04_2_20220715_20220717, "test_dir_1/sensor_20M04_2_20220715_20220717.csv")
write_csv(sensor_20M04_2_20220717_20220719, "test_dir_1/sensor_20M04_2_20220717_20220719.csv")

CodePudding user response:

The following code worked for me using your sample data, though assumes that all the data frames in the global environment has the site ID in the name (as in the example data). Good luck!

# Define site IDs
ssites <- c("04M06", "20M04", "SFM01")

#' Put all files in the global directory with 
#' the same site id in the name in a list

llist <- list()
for(i in ssites){
  llist[[i]] <- mget(ls(pattern = ssites[i]))
}

#row bind all the same site IDs
bind_list <- lapply(llist, function(x) do.call(rbind, x))

# export to a CSV
for(i in seq_along(bind_list)){
  write.csv(bind_list[[i]], paste0("dir_1/site_",i,".csv"))
}
  • Related