Home > Blockchain >  Create numerical discrete values if values in a column equal in R
Create numerical discrete values if values in a column equal in R

Time:03-10

I have a column of IDs in a dataframe that sometimes has duplicates, take for example,

ID
209
315
109
315
451
209

What I want to do is take this column and create another column that indicates what ID the row belongs to. i.e. I want it to look like,

ID ID Category
209 1
315 2
109 3
315 2
451 4
209 1

Essentially, I want to loop through the IDs and if it equals to a previous one, I indicate that it is from the same ID, and if it is a new ID, I create a new indicator for it.

Does anyone know is there a quick function in R that I could do this with? Or have any other suggestions?

CodePudding user response:

Convert to factor with levels ordered with unique (order of appearance in the data set) and then to numeric:

data$IDCategory <- as.numeric(factor(data$ID, levels = unique(data$ID)))

#> data
#   ID IDCategory
#1 209          1
#2 315          2
#3 109          3
#4 315          2
#5 451          4
#6 209          1

CodePudding user response:

library(tidyverse)

data <- tibble(ID= c(209,315,109,315,451,209))

data %>%
  left_join(
    data %>%
      distinct(ID) %>%
      mutate(`ID Category` = row_number())
  )
#> Joining, by = "ID"
#> # A tibble: 6 × 2
#>      ID `ID Category`
#>   <dbl>         <int>
#> 1   209             1
#> 2   315             2
#> 3   109             3
#> 4   315             2
#> 5   451             4
#> 6   209             1

Created on 2022-03-10 by the reprex package (v2.0.0)

CodePudding user response:

df <- df %>%
  dplyr::mutate(`ID Category` = as.numeric(interaction(ID, drop=TRUE)))

Answer with data.table

library(data.table)
df <- as.data.table(df)

df <- df[
  j = `ID Category` := as.numeric(interaction(ID, drop=TRUE))
]

The pro of this solution is that you can create an unique ID for a group of variables. Here you only need ID, but if you want to have an unique ID let say for the couple [IDLocation] you could.

data <- tibble(ID= c(209,209,209,315,315,315), Location = c("A","B","C","A","A","B"))
data <- data %>%
  dplyr::mutate(`ID Category` = as.numeric(interaction(ID, Location, drop=TRUE)))

CodePudding user response:

another way:

merge(data,
  data.frame(ID = unique(data$ID),
             ID.Category = seq_along(unique(data$ID))
  ), sort = F)
  
# ID ID.Category
# 1 209           1
# 2 209           1
# 3 315           2
# 4 315           2
# 5 109           3
# 6 451           4

data:

tibble(ID = c(209,315,109,315,451,209)) -> data
  • Related