I have a column of IDs in a dataframe that sometimes has duplicates, take for example,
ID |
---|
209 |
315 |
109 |
315 |
451 |
209 |
What I want to do is take this column and create another column that indicates what ID the row belongs to. i.e. I want it to look like,
ID | ID Category |
---|---|
209 | 1 |
315 | 2 |
109 | 3 |
315 | 2 |
451 | 4 |
209 | 1 |
Essentially, I want to loop through the IDs and if it equals to a previous one, I indicate that it is from the same ID, and if it is a new ID, I create a new indicator for it.
Does anyone know is there a quick function in R that I could do this with? Or have any other suggestions?
CodePudding user response:
Convert to factor with levels ordered with unique
(order of appearance in the data set) and then to numeric:
data$IDCategory <- as.numeric(factor(data$ID, levels = unique(data$ID)))
#> data
# ID IDCategory
#1 209 1
#2 315 2
#3 109 3
#4 315 2
#5 451 4
#6 209 1
CodePudding user response:
library(tidyverse)
data <- tibble(ID= c(209,315,109,315,451,209))
data %>%
left_join(
data %>%
distinct(ID) %>%
mutate(`ID Category` = row_number())
)
#> Joining, by = "ID"
#> # A tibble: 6 × 2
#> ID `ID Category`
#> <dbl> <int>
#> 1 209 1
#> 2 315 2
#> 3 109 3
#> 4 315 2
#> 5 451 4
#> 6 209 1
Created on 2022-03-10 by the reprex package (v2.0.0)
CodePudding user response:
df <- df %>%
dplyr::mutate(`ID Category` = as.numeric(interaction(ID, drop=TRUE)))
Answer with data.table
library(data.table)
df <- as.data.table(df)
df <- df[
j = `ID Category` := as.numeric(interaction(ID, drop=TRUE))
]
The pro of this solution is that you can create an unique ID for a group of variables. Here you only need ID
, but if you want to have an unique ID let say for the couple [ID
—Location
] you could.
data <- tibble(ID= c(209,209,209,315,315,315), Location = c("A","B","C","A","A","B"))
data <- data %>%
dplyr::mutate(`ID Category` = as.numeric(interaction(ID, Location, drop=TRUE)))
CodePudding user response:
another way:
merge(data,
data.frame(ID = unique(data$ID),
ID.Category = seq_along(unique(data$ID))
), sort = F)
# ID ID.Category
# 1 209 1
# 2 209 1
# 3 315 2
# 4 315 2
# 5 109 3
# 6 451 4
data:
tibble(ID = c(209,315,109,315,451,209)) -> data