Given the following sample data (R data frame
), I would like to create folders for each group, containing folders for each sub group, (finally) containing a CSV file with corresponding values val1
, using R.
Grp <- c("A", "A", "A", "B", "B", "B")
Subgrp <- c("k", "l", "m", "n", "n", "n")
val1 <- c(1.1, 3.2, 4.5, 5.6, 6.7, 7.7)
df <- data.frame(Grp, Subgrp, val1)
This is what I have tried so far, creating a CSV file for each sub-group.
by(df, df$Subgrp, FUN=function(i) write.csv(i,paste0("C:/Temp/",i$Subgrp[1], ".csv")))
I need to iterate this for large data frame. What is the most efficient way of achieving this? I am open for suggestions based on looping and dpyr.
CodePudding user response:
You're close.
My interpretation:
- "create folders for each group, containing folders for each sub group": using directories such as
.../A/k
and.../B/n
- "a CSV file with corresponding values
val1
": remove theGrp
andSubgrp
columns from the frame when saving to CSV. (If not this, then replacesubset(...)
with justi
below.)
basepath <- "c:/Temp"
# pre-create directories
for (p in unique(file.path(basepath, df$Grp, df$Subgrp))) dir.create(p, recursive = TRUE)
by(df, df$Subgrp, FUN=function(i) {
write.csv(subset(i, select = -c(Grp, Subgrp)),
file.path(basepath, i$Grp[1], i$Subgrp[1], "value.csv"),
row.names = FALSE)
})
# df$Subgrp: k
# NULL
# ------------------------------------------------------------
# df$Subgrp: l
# NULL
# ------------------------------------------------------------
# df$Subgrp: m
# NULL
# ------------------------------------------------------------
# df$Subgrp: n
# NULL
lf <- list.files(".", pattern = "csv$", recursive = TRUE, full.names = TRUE)
lf
# [1] "./A/k/value.csv" "./A/l/value.csv" "./A/m/value.csv" "./B/n/value.csv"
read.csv(lf[1])
# val1
# 1 1.1
Since the return value from by
is inconsequential, you can either wrap it in invisible(.)
or capture and ignore the output, ign <- by(..)
.
CodePudding user response:
I would rather use lapply
so you can conveniently use the names of the subgroups.
lapply(df$Subgrp, \(x) write.csv(subset(df, Subgrp == x), file=sprintf('C:/Temp/%s.csv', x)))
CodePudding user response:
Using tidyverse:
df %>%
unite(file, Grp, Subgrp, sep='/')%>%
group_by(file = paste0(file, '.csv'))%>%
summarise(write.csv(cur_data(),
if(dir.exists(dirname(file[1]))) file[1]
else {dir.create(dirname(file[1]),recursive = T);file[1]},row.names = FALSE))