Home > Back-end >  R : How to extract the factor levels as numeric from a column and assign it to a new column using ty
R : How to extract the factor levels as numeric from a column and assign it to a new column using ty

Time:02-11

Suppose I have a data frame, df

df = data.frame(name = rep(c("A", "B", "C"), each = 4))

I want to get a new data frame with one additional column named Group, in which Group element is the numeric value of the corresponding level of name, as shown in df2.

I know case_when could do it. My issue is that my real data frame is quite complicated, there are many levels of the name column. I am too lazy to list case by case.

Is there an easier and smarter way to do it?

Thanks.

df2
   name Group
1     A     1
2     A     1
3     A     1
4     A     1
5     B     2
6     B     2
7     B     2
8     B     2
9     C     3
10    C     3
11    C     3
12    C     3

CodePudding user response:

There are a few ways to do it in tidyverse

library(tidyverse)

df %>% group_by(name) %>% mutate(Group = cur_group_id())

or

df %>% mutate(Group = as.numeric(as.factor(name)))

Output

  name Group
1     A  1
2     A  1
3     A  1
4     A  1
5     B  2
6     B  2
7     B  2
8     B  2
9     C  3
10    C  3
11    C  3
12    C  3

CodePudding user response:

A couple other simple solutions:

library(dplyr)

df %>%
  mutate(Group = match(name, unique(name)))
#>    name Group
#> 1     A     1
#> 2     A     1
#> 3     A     1
#> 4     A     1
#> 5     B     2
#> 6     B     2
#> 7     B     2
#> 8     B     2
#> 9     C     3
#> 10    C     3
#> 11    C     3
#> 12    C     3

df %>%
  mutate(Group = cumsum(name != lag(name, default = "")))
#>    name Group
#> 1     A     1
#> 2     A     1
#> 3     A     1
#> 4     A     1
#> 5     B     2
#> 6     B     2
#> 7     B     2
#> 8     B     2
#> 9     C     3
#> 10    C     3
#> 11    C     3
#> 12    C     3

CodePudding user response:

data.table

df = data.frame(name = rep(c("A", "B", "C"), each = 4))

library(data.table)
setDT(df)[, grp := .GRP, by = name][]
#>     name grp
#>  1:    A   1
#>  2:    A   1
#>  3:    A   1
#>  4:    A   1
#>  5:    B   2
#>  6:    B   2
#>  7:    B   2
#>  8:    B   2
#>  9:    C   3
#> 10:    C   3
#> 11:    C   3
#> 12:    C   3

Created on 2022-02-10 by the reprex package (v2.0.1)

  • Related