I have this data
df <- data.frame(
id = c(1L,1L,1L,2L,2L,2L,3L,3L),
groupA = c("A","A","B","B","B","B","A","A"),
groupB = c("red", "red", "red", "blue", "red", "blue", "blue", "red"))
id groupA groupB
1 1 A red
2 1 A red
3 1 B red
4 2 B blue
5 2 B red
6 2 B blue
7 3 A blue
8 3 A red
I would like to make groups by multiple columns to get this
id groupA groupB nr.group
1 1 A red 1
2 1 A red 1
3 1 B red 2
4 2 B blue 1
5 2 B red 2
6 2 B blue 1
7 3 A blue 1
8 3 A red 2
My solution
df %>%
group_by(id, groupA, groupB)%>%
mutate(nr.group = 1:n())
But it count rows within groups. And I would like to try dplyr and basic R solution to compare.
CodePudding user response:
You could try
library(dplyr)
df %>%
group_by(id) %>%
mutate(grp = paste(groupA, groupB),
nr.group = match(grp, unique(grp))) %>%
ungroup() %>%
select(-grp)
or
df %>%
distinct() %>%
group_by(id) %>%
mutate(nr.group = 1:n()) %>%
left_join(df, .)
Output
# A tibble: 8 × 4
id groupA groupB nr.group
<int> <chr> <chr> <int>
1 1 A red 1
2 1 A red 1
3 1 B red 2
4 2 B blue 1
5 2 B red 2
6 2 B blue 1
7 3 A blue 1
8 3 A red 2
CodePudding user response:
We could convert to factor
and coerce to integer to do this
library(dplyr)
library(stringr)
df %>%
group_by(id) %>%
mutate(grp = str_c(groupA, groupB),
nr.group = as.integer(factor(grp, levels = unique(grp)))) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 8 × 4
id groupA groupB nr.group
<int> <chr> <chr> <int>
1 1 A red 1
2 1 A red 1
3 1 B red 2
4 2 B blue 1
5 2 B red 2
6 2 B blue 1
7 3 A blue 1
8 3 A red 2