Home > Software engineering >  Create new id by matching two column value
Create new id by matching two column value

Time:09-15

I have the following data and I want to create a new id called newid using the column id and class. The first id repeat the value but with different class value, so that I need to create a new id by matching both id and class.

data <- data.frame(id=c(1,1,1,1,1, 2,2,2,2,2,3,3,3,3,1,1,1,1,2,2,2,4,4,4),
                        class=c('x','x','x','x','x', 'y','y','y','y','y', 'z','z','z','z', 'w','w','w','w', 'v','v','v','n','n','n'))

Expected output

   id class newid
1   1     x     1
2   1     x     1
3   1     x     1
4   1     x     1
5   1     x     1
6   2     y     2
7   2     y     2
8   2     y     2
9   2     y     2
10  2     y     2
11  3     z     3
12  3     z     3
13  3     z     3
14  3     z     3
15  1     w     4
16  1     w     4
17  1     w     4
18  1     w     4
19  2     v     5
20  2     v     5
21  2     v     5
22  4     n     6
23  4     n     6
24  4     n     6

CodePudding user response:

You could use match():

library(dplyr)

data %>%
  mutate(grp = paste(id, class),
         newid = match(grp, unique(grp))) %>%
  select(-grp)

   id class newid
1   1     x     1
2   1     x     1
3   1     x     1
4   1     x     1
5   1     x     1
6   2     y     2
7   2     y     2
8   2     y     2
9   2     y     2
10  2     y     2
11  3     z     3
12  3     z     3
13  3     z     3
14  3     z     3
15  1     w     4
16  1     w     4
17  1     w     4
18  1     w     4
19  2     v     5
20  2     v     5
21  2     v     5
22  4     n     6
23  4     n     6
24  4     n     6

CodePudding user response:

One option is to use cur_group_id() but see the note at the end.

data %>%
    group_by(id, class) %>%
    mutate(newid = cur_group_id()) %>%
    ungroup() 
## A tibble: 24 × 3
#     id class newid
#   <dbl> <chr> <int>
# 1     1 x         2
# 2     1 x         2
# 3     1 x         2
# 4     1 x         2
# 5     1 x         2
# 6     2 y         4
# 7     2 y         4
# 8     2 y         4
# 9     2 y         4
#10     2 y         4
## … with 14 more rows
## ℹ Use `print(n = ...)` to see more rows

Note: This creates a unique newid per (id, class) combination; the order is different from your expected output in that it uses numerical/lexicographical ordering: (1, w) comes before (1, x) which comes before (2, v) and so on.

So as long as you don't care about the actual value, cur_group_id() will always create a unique id per value combination of the grouping variables.

  • Related