If I have a df and want to do a grouped ID i would do:
df <- data.frame(id= rep(c(1,8,4), each = 3), score = runif(9))
df %>% group_by(id) %>% mutate(ID = cur_group_id())
following(How to create a consecutive group number answer of @Ronak Shah).
Now I have a list of those dfs and want to give consecutive group numbers, but they shall not start in every lists element new. In other words the ID column in listelement is 1 to 10, and in list two 11 to 15 and so on (so I can´t simply run the same code with lapply).
I guess I could do something like:
names(df)<-c("a", "b")
df<- mapply(cbind,df, "list"=names(df), SIMPLIFY=F)
df <- do.call(rbind, list)
df<-df %>% group_by(id) %>% mutate(ID = cur_group_id())
split(df, list)
but maybe some have more direct, clever ways?
CodePudding user response:
A dplyr
way could be using bind_rows
as group_split
(experimental):
library(dplyr)
df_list |>
bind_rows(.id = "origin") |>
mutate(ID = consecutive_id(id)) |> # If dplyr v.<1.1.0, use ID = cumsum(!duplicated(id))
group_split(origin, .keep = FALSE)
Output:
[[1]]
# A tibble: 9 × 3
id score ID
<dbl> <dbl> <int>
1 1 0.187 1
2 1 0.232 1
3 1 0.317 1
4 8 0.303 2
5 8 0.159 2
6 8 0.0400 2
7 4 0.219 3
8 4 0.811 3
9 4 0.526 3
[[2]]
# A tibble: 9 × 3
id score ID
<dbl> <dbl> <int>
1 3 0.915 4
2 3 0.831 4
3 3 0.0458 4
4 5 0.456 5
5 5 0.265 5
6 5 0.305 5
7 2 0.507 6
8 2 0.181 6
9 2 0.760 6
Data:
set.seed(1234)
df1 <- tibble(id = rep(c(1,8,4), each = 3), score = runif(9))
df2 <- tibble(id = rep(c(3,5,2), each = 3), score = runif(9))
df_list <- list(df1, df2)
Or using cur_group_id()
for the group number, this approach, however, gives another order than you expect in your question:
library(dplyr)
df_list |>
bind_rows(.id = "origin") |>
mutate(ID = cur_group_id(), .by = "id") |> # If dplyr v.<1.1.0, use group_by()-notation
group_split(origin, .keep = FALSE)
Output:
[[1]]
# A tibble: 9 × 3
id score ID
<dbl> <dbl> <int>
1 1 0.187 1
2 1 0.232 1
3 1 0.317 1
4 8 0.303 6
5 8 0.159 6
6 8 0.0400 6
7 4 0.219 4
8 4 0.811 4
9 4 0.526 4
[[2]]
# A tibble: 9 × 3
id score ID
<dbl> <dbl> <int>
1 3 0.915 3
2 3 0.831 3
3 3 0.0458 3
4 5 0.456 5
5 5 0.265 5
6 5 0.305 5
7 2 0.507 2
8 2 0.181 2
9 2 0.760 2