Writing files by group into subfolders-CodePudding

Working with a function to write to breakdown a large dataset into grouped files

State	col1	col2
MI	a	e
MI	b	f
OH	c	g
OH	d	h

Output is currently working and parsing out files as MI.csv & OH.csv

by(df, df$State, FUN=function(i) 
write.csv(i, paste0(i$State[1], ".csv"), na = "", row.names = FALSE))

How can I run this function or run it again on MI.csv to write all grouped values in col1 into new files? ie a.csv is ~/MI/a.csv, b is ~/MI/b.csv

Tried different variations of block below

by(df, df$State, FUN=function(i) 
write.csv(i, paste0(i$State[1], "~/*.csv"), na = "", row.names = FALSE))

CodePudding user response：

Try

library(purrr)
library(stringr)
imap(split(df, df[-3], drop = TRUE),
   ~ write.csv(.x, str_c("~/", str_replace(.y, fixed("."), "/"), 
      ".csv"), na = "", row.names = FALSE))

CodePudding user response：

You can nest the by calls. I also added a check for the directory so that it would be created if it didn't exist.

You could basically read this as by the data frame, for each state, by each entry in col1; if the directory (current working directory)/state doesn't exist, create it. Then write the remaining data to a file within the appropriate state folder, named for the unique value in col1 (and don't include row names).

A couple of things to note:

This will send the entire data frame to the file, so the state column and col1 will only have 1 unique value (as it's written right now).
If the data frame is empty, you'll be notified in the console. No empty files are created.

by(df, df$State,  
   function(i) by(i, i$col1, function(j) {
     if(!file.exists(i$State[1])) dir.create(file.path(getwd(), j$State[1]))
     write.csv(j, paste0(file.path(getwd(), j$State[1]), "/", j$col1[1], ".csv"),
               row.names = F)
   })
)