I have many dataframes. I would like to split them based on the values in a column (a factor). Then I would like to store the result of the split in separate data frame that have a specific name.
For the sake of a mrp, consider some generated data,
for (i in 1:10) {
assign(paste("df_",i,sep = ""), data.frame(x = rep(1,12), y = c(rep("a",4),rep("b",4),rep("c",4))))
}
here we have 10 dfs, df_1, df_2... to df_10. (real data is similar to generated data, but in real data column z is different for each df).
Now, I want to split the dfs by 'y' (column 2).
For 1 df, I can do the following;
splitdf <- split(df_1,df_1$y)
namessplit <- c("a","b","c")
for (i in 1:length(splitdf)) {
assign(paste("df_1_",namessplit[[i]],sep = ""),splitdf[[i]])
}
While this works for 1 df, how can I do it for all the dfs?
Big thanks in advance!
CodePudding user response:
It is not recommended to create multiple objects in the global env, but if we want to know how to create the objects from a nested list - Loop over the outer list sequence and then in the inner list sequence, paste
the corresponding names to assign
the extracted inner list element
lst1 <- lapply(mget(ls(pattern = "^df_\\d $")), \(x) split(x, x$y))
for(i in seq_along(lst1)) {
for(j in seq_along(lst1[[i]])) {
assign(paste0(names(lst1)[i], "_", names(lst1[[i]][j])), lst1[[i]][[j]])
}
}
-checking for objects created in the global env
> ls(pattern = "^df_\\d _[a-z] $")
[1] "df_1_a" "df_1_b" "df_1_c" "df_10_a" "df_10_b" "df_10_c" "df_2_a" "df_2_b" "df_2_c" "df_3_a" "df_3_b" "df_3_c" "df_4_a"
[14] "df_4_b" "df_4_c" "df_5_a" "df_5_b" "df_5_c" "df_6_a" "df_6_b" "df_6_c" "df_7_a" "df_7_b" "df_7_c" "df_8_a" "df_8_b"
[27] "df_8_c" "df_9_a" "df_9_b" "df_9_c"