Suppose I have a data frame, df
df = data.frame(name = rep(c("A", "B", "C"), each = 4))
I want to get a new data frame with one additional column named Group
, in which Group
element is the numeric value of the corresponding level of name
, as shown in df2.
I know case_when
could do it. My issue is that my real data frame is quite complicated, there are many levels of the name
column. I am too lazy to list case by case.
Is there an easier and smarter way to do it?
Thanks.
df2
name Group
1 A 1
2 A 1
3 A 1
4 A 1
5 B 2
6 B 2
7 B 2
8 B 2
9 C 3
10 C 3
11 C 3
12 C 3
CodePudding user response:
There are a few ways to do it in tidyverse
library(tidyverse)
df %>% group_by(name) %>% mutate(Group = cur_group_id())
or
df %>% mutate(Group = as.numeric(as.factor(name)))
Output
name Group
1 A 1
2 A 1
3 A 1
4 A 1
5 B 2
6 B 2
7 B 2
8 B 2
9 C 3
10 C 3
11 C 3
12 C 3
CodePudding user response:
A couple other simple solutions:
library(dplyr)
df %>%
mutate(Group = match(name, unique(name)))
#> name Group
#> 1 A 1
#> 2 A 1
#> 3 A 1
#> 4 A 1
#> 5 B 2
#> 6 B 2
#> 7 B 2
#> 8 B 2
#> 9 C 3
#> 10 C 3
#> 11 C 3
#> 12 C 3
df %>%
mutate(Group = cumsum(name != lag(name, default = "")))
#> name Group
#> 1 A 1
#> 2 A 1
#> 3 A 1
#> 4 A 1
#> 5 B 2
#> 6 B 2
#> 7 B 2
#> 8 B 2
#> 9 C 3
#> 10 C 3
#> 11 C 3
#> 12 C 3
CodePudding user response:
data.table
df = data.frame(name = rep(c("A", "B", "C"), each = 4))
library(data.table)
setDT(df)[, grp := .GRP, by = name][]
#> name grp
#> 1: A 1
#> 2: A 1
#> 3: A 1
#> 4: A 1
#> 5: B 2
#> 6: B 2
#> 7: B 2
#> 8: B 2
#> 9: C 3
#> 10: C 3
#> 11: C 3
#> 12: C 3
Created on 2022-02-10 by the reprex package (v2.0.1)