Working with a function to write to breakdown a large dataset into grouped files
State | col1 | col2 |
---|---|---|
MI | a | e |
MI | b | f |
OH | c | g |
OH | d | h |
Output is currently working and parsing out files as MI.csv & OH.csv
by(df, df$State, FUN=function(i)
write.csv(i, paste0(i$State[1], ".csv"), na = "", row.names = FALSE))
How can I run this function or run it again on MI.csv to write all grouped values in col1 into new files? ie a.csv is ~/MI/a.csv, b is ~/MI/b.csv
Tried different variations of block below
by(df, df$State, FUN=function(i)
write.csv(i, paste0(i$State[1], "~/*.csv"), na = "", row.names = FALSE))
CodePudding user response:
Try
library(purrr)
library(stringr)
imap(split(df, df[-3], drop = TRUE),
~ write.csv(.x, str_c("~/", str_replace(.y, fixed("."), "/"),
".csv"), na = "", row.names = FALSE))
CodePudding user response:
You can nest the by
calls. I also added a check for the directory so that it would be created if it didn't exist.
You could basically read this as by the data frame, for each state, by each entry in col1
; if the directory (current working directory)/state doesn't exist, create it. Then write the remaining data to a file within the appropriate state folder, named for the unique value in col1
(and don't include row names).
A couple of things to note:
- This will send the entire data frame to the file, so the state column and
col1
will only have 1 unique value (as it's written right now). - If the data frame is empty, you'll be notified in the console. No empty files are created.
by(df, df$State,
function(i) by(i, i$col1, function(j) {
if(!file.exists(i$State[1])) dir.create(file.path(getwd(), j$State[1]))
write.csv(j, paste0(file.path(getwd(), j$State[1]), "/", j$col1[1], ".csv"),
row.names = F)
})
)