Home > other >  consecutive grouped ID list R
consecutive grouped ID list R

Time:02-01

If I have a df and want to do a grouped ID i would do:

df <- data.frame(id= rep(c(1,8,4), each = 3), score = runif(9))
df %>% group_by(id) %>% mutate(ID = cur_group_id())

following(How to create a consecutive group number answer of @Ronak Shah).

Now I have a list of those dfs and want to give consecutive group numbers, but they shall not start in every lists element new. In other words the ID column in listelement is 1 to 10, and in list two 11 to 15 and so on (so I can´t simply run the same code with lapply).

I guess I could do something like:

names(df)<-c("a", "b")
df<- mapply(cbind,df, "list"=names(df), SIMPLIFY=F)
df <- do.call(rbind, list)
df<-df %>% group_by(id) %>% mutate(ID = cur_group_id())
split(df, list)

but maybe some have more direct, clever ways?

CodePudding user response:

A dplyr way could be using bind_rows as group_split (experimental):

library(dplyr)

df_list |>
  bind_rows(.id = "origin") |>
  mutate(ID = consecutive_id(id)) |> # If dplyr v.<1.1.0, use ID = cumsum(!duplicated(id))
  group_split(origin, .keep = FALSE)

Output:

[[1]]
# A tibble: 9 × 3
     id  score    ID
  <dbl>  <dbl> <int>
1     1 0.187      1
2     1 0.232      1
3     1 0.317      1
4     8 0.303      2
5     8 0.159      2
6     8 0.0400     2
7     4 0.219      3
8     4 0.811      3
9     4 0.526      3

[[2]]
# A tibble: 9 × 3
     id  score    ID
  <dbl>  <dbl> <int>
1     3 0.915      4
2     3 0.831      4
3     3 0.0458     4
4     5 0.456      5
5     5 0.265      5
6     5 0.305      5
7     2 0.507      6
8     2 0.181      6
9     2 0.760      6

Data:

set.seed(1234)

df1 <- tibble(id = rep(c(1,8,4), each = 3), score = runif(9))
df2 <- tibble(id = rep(c(3,5,2), each = 3), score = runif(9))

df_list <- list(df1, df2)

Or using cur_group_id() for the group number, this approach, however, gives another order than you expect in your question:

library(dplyr)

df_list |>
  bind_rows(.id = "origin") |>
  mutate(ID = cur_group_id(), .by = "id") |> # If dplyr v.<1.1.0, use group_by()-notation
  group_split(origin, .keep = FALSE)

Output:

[[1]]
# A tibble: 9 × 3
     id  score    ID
  <dbl>  <dbl> <int>
1     1 0.187      1
2     1 0.232      1
3     1 0.317      1
4     8 0.303      6
5     8 0.159      6
6     8 0.0400     6
7     4 0.219      4
8     4 0.811      4
9     4 0.526      4

[[2]]
# A tibble: 9 × 3
     id  score    ID
  <dbl>  <dbl> <int>
1     3 0.915      3
2     3 0.831      3
3     3 0.0458     3
4     5 0.456      5
5     5 0.265      5
6     5 0.305      5
7     2 0.507      2
8     2 0.181      2
9     2 0.760      2
  • Related