I have a (large) dataset with three variables. For each combination of sub1 and sub2, I would like to save a all unique IVs in a separate vector or dataset, ignoring id, and name it using the variables "sub1.and.sub2.IV". As my dataset is quite large, I would like to avoid using which
and automatically extract all combinations.
id sub1 sub2 IV
<chr> <chr> <chr> <chr>
1 3 a a p
2 3 a a f
3 6 a b z
4 6 a b e
5 7 a c b
6 7 a c b
In the end, I would have three vector or datasets:
> a.and.a.IV
[1] "p" "f"
> a.and.b.IV
[1] "z" "e"
> a.and.c.IV
[1] "b"
MRE example:
structure(list(id = c("3", "3", "6", "6", "7", "7"), sub1 = c("a",
"a", "a", "a", "a", "a"), sub2 = c("a", "a", "b", "b", "c", "c"
), IV = c("p", "f", "z", "e", "b", "b")), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
Maybe split
> split(df$IV, df[c("sub1","sub2")])
$a.a
[1] "p" "f"
$a.b
[1] "z" "e"
$a.c
[1] "b" "b"
CodePudding user response:
One possibility could be::
a.and.a.IV<-unique(df[which(df$sub1 == "a" & df$sub2=="a"),]$IV)
a.and.b.IV<-unique(df[which(df$sub1 == "a" & df$sub2=="b"),]$IV)
a.and.c.IV<-unique(df[which(df$sub1 == "a" & df$sub2=="c"),]$IV)
> a.and.a.IV
[1] "p" "f"
> a.and.b.IV
[1] "z" "e"
> a.and.c.IV
[1] "b"
CodePudding user response:
I used @ThomasIsCoding's comment to search for more solutions. I have found 3 solutions to split the dataframe into a list of tibbles and 1 solution using a loop to split a list into dataframes. The for loop stays the same for every solution:
Solution 1: Using a custom made function by @romainfrancois to split and name the data.frames with the corresponding combinations of sub1 and sub2.
library(dplyr, warn.conflicts = FALSE)
named_group_split <- function(.tbl, ...) {
grouped <- group_by(.tbl, ...)
names <- rlang::eval_bare(rlang::expr(paste(!!!group_keys(grouped), sep = " / ")))
grouped %>%
group_split() %>%
rlang::set_names(names)
}
df_split1 <- df %>%
named_group_split(sub1, sub2) %>%
unique()
for(i in 1:length(df_split1)) {
assign(paste0(names(df_split1[i])), as.data.frame(df_split1[[i]]))
}
Solution 2:
Using dplyr::group_split
to split the dataset into a list with all the original variables and their respective names. Unfortunately, this solution is not able to name the data.frames. Solution found here.
df_split2 <- df %>%
group_split(sub1, sub2)
for(i in 1:length(df_split2)) {
assign(paste0(names(df_split2[i])), as.data.frame(df_split2[[i]]))
}
Solution 3:
Using base::split
allows to split the dataset into a list with just IVs as variable and the for loop
.
df_split3 <- split(df$IV, df[c("sub1","sub2")])
for(i in 1:length(df_split3)) {
assign(paste0(names(df_split3[i])), as.data.frame(df_split3[[i]]))
}