Create numerical discrete values if values in a column equal in R-CodePudding

I have a column of IDs in a dataframe that sometimes has duplicates, take for example,

ID
209
315
109
315
451
209

What I want to do is take this column and create another column that indicates what ID the row belongs to. i.e. I want it to look like,

ID	ID Category
209	1
315	2
109	3
315	2
451	4
209	1

Essentially, I want to loop through the IDs and if it equals to a previous one, I indicate that it is from the same ID, and if it is a new ID, I create a new indicator for it.

Does anyone know is there a quick function in R that I could do this with? Or have any other suggestions?

CodePudding user response：

Convert to factor with levels ordered with unique (order of appearance in the data set) and then to numeric:

data$IDCategory <- as.numeric(factor(data$ID, levels = unique(data$ID)))

#> data
#   ID IDCategory
#1 209          1
#2 315          2
#3 109          3
#4 315          2
#5 451          4
#6 209          1

CodePudding user response：

library(tidyverse)

data <- tibble(ID= c(209,315,109,315,451,209))

data %>%
  left_join(
    data %>%
      distinct(ID) %>%
      mutate(`ID Category` = row_number())
  )
#> Joining, by = "ID"
#> # A tibble: 6 × 2
#>      ID `ID Category`
#>   <dbl>         <int>
#> 1   209             1
#> 2   315             2
#> 3   109             3
#> 4   315             2
#> 5   451             4
#> 6   209             1

^{Created on 2022-03-10 by the reprex package (v2.0.0)}

CodePudding user response：

df <- df %>%
  dplyr::mutate(`ID Category` = as.numeric(interaction(ID, drop=TRUE)))

Answer with data.table

library(data.table)
df <- as.data.table(df)

df <- df[
  j = `ID Category` := as.numeric(interaction(ID, drop=TRUE))
]

The pro of this solution is that you can create an unique ID for a group of variables. Here you only need ID, but if you want to have an unique ID let say for the couple [ID—Location] you could.

data <- tibble(ID= c(209,209,209,315,315,315), Location = c("A","B","C","A","A","B"))
data <- data %>%
  dplyr::mutate(`ID Category` = as.numeric(interaction(ID, Location, drop=TRUE)))

CodePudding user response：

another way:

merge(data,
  data.frame(ID = unique(data$ID),
             ID.Category = seq_along(unique(data$ID))
  ), sort = F)
  
# ID ID.Category
# 1 209           1
# 2 209           1
# 3 315           2
# 4 315           2
# 5 109           3
# 6 451           4

data:

tibble(ID = c(209,315,109,315,451,209)) -> data