I have data that is uniquely identified with an id, and grouped by a sub_id and group_id. Each instance with a unique object_id generates an id. Similar instances concerning different objects are grouped by a sub_id. Multiple interactions for one instance with one or more objects are grouped by the group_id column. Essentially, if someone interacts twice for two objects, there will be 4 unique ids, two unique sub_ids and a single group_id. The only way to distinguish the difference in the duplicate interactions is through the object_id column:
library(tidyverse)
tibble(id = c(1:11),
sub_id = c(1,2,1,2,3,4,3,4,5,6,6),
group_id = c(rep(1,4),rep(3,4),5,6,6),
object_id = c(1,1,2,2,2,2,3,3,2,2,3))
# A tibble: 11 × 4
id sub_id group_id object_id
<int> <dbl> <dbl> <dbl>
1 1 1 1 1
2 2 2 1 1
3 3 1 1 2
4 4 2 1 2
5 5 3 3 2
6 6 4 3 2
7 7 3 3 3
8 8 4 3 3
9 9 5 5 2
10 10 6 6 2
11 11 6 6 3
I would like some sort of sub_group_id that will split the group_id interactions in sequences using the object_id column. The expected output would look like this:
# A tibble: 11 × 5
id sub_id group_id subgroup_id object_id
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 1 1
2 2 2 1 1 1
3 3 1 1 2 2
4 4 2 1 2 2
5 5 3 3 3 2
6 6 4 3 3 2
7 7 3 3 4 3
8 8 4 3 4 3
9 9 5 5 5 2
10 10 6 6 6 2
11 11 6 6 7 3
That column essentially needs to increase infinitly to distinguish between different subgroups. It ressembles the object_id column, but the same object can be present in different sub_ids, group_ids, and subgroup_ids, whereas each subgroup_id only appears in one group_id. Any ideas on how to achieve that output?
CodePudding user response:
group_by
group_id
and object_id
, and then use cur_group_id
to get the sequence of id for each subgroup:
dat %>%
group_by(group_id, object_id) %>%
mutate(subgroup_id = cur_group_id())
id sub_id group_id object_id subgroup_id
<int> <dbl> <dbl> <dbl> <int>
1 1 1 1 1 1
2 2 2 1 1 1
3 3 1 1 2 2
4 4 2 1 2 2
5 5 3 3 2 3
6 6 4 3 2 3
7 7 3 3 3 4
8 8 4 3 3 4
9 9 5 5 2 5
10 10 6 6 2 6
11 11 6 6 3 7
CodePudding user response:
We could use
library(dplyr)
library(data.table)
df1 %>%
mutate(subgroup_id = rleid(object_id, group_id))