Here is my data
# Create the data frame.
mydataframe <- data.frame(
emp_id = c (100,101,100,200,150,200,600,100,150,600),
value = c(5,3,2,1,6,7,8,3,2,1)
)
# Print the data frame.
print(mydataframe)
I want to write a function to replace id's that occur multiple time in the id column by giving it a unique number, such as 100 will be P1, and 200 will be P2.
mydataframe %>%
mutate(emp_id = as.integer(factor(emp_id, levels = unique(emp_id))))
mydataframe %>%
mutate(emp_id = match(emp_id, unique(emp_id)))
library(dplyr)
mydataframe %>%
group_by(emp_id = factor(emp_id, levels = unique(emp_id))) %>%
mutate(emp_id = cur_group_id())
I tried all these and it's working fine. But I still want to see P1, P2 , ...etc instead of 1, 2,3.
Note: I call P1,P2, ...etc genrate names; maybe there is a better way to call this, but I just make it simple for better understanding
expected results will be
emp_id ID value
1 1 P1 5
2 2 P2 3
3 1 P1 2
4 3 P3 1
5 4 P4 6
6 3 P3 7
7 5 P5 8
8 1 P1 3
9 4 P4 2
10 5 P5 1
Thank you
CodePudding user response:
Solution using dplyr::dense_rank()
.
library(dplyr)
mydataframe %>%
mutate(
emp_id = dense_rank(emp_id),
ID = paste0("P", emp_id)
)
emp_id value ID
1 1 5 P1
2 2 3 P2
3 1 2 P1
4 4 1 P4
5 3 6 P3
6 4 7 P4
7 5 8 P5
8 1 3 P1
9 3 2 P3
10 5 1 P5
Note this creates the new ids based on numerical order of the old ids, not row order.
CodePudding user response:
With tidyverse, try an "on the fly" left join with to an index of ids:
library(tidyverse)
mydataframe %>%
left_join(x = . ,
y = select( . , emp_id) %>% unique() %>%
mutate(id = paste0("P", row_number())),
by = "emp_id") %>%
relocate(id, .after = emp_id)
emp_id id value
1 100 P1 5
2 101 P2 3
3 100 P1 2
4 200 P3 1
5 150 P4 6
6 200 P3 7
7 600 P5 8
8 100 P1 3
9 150 P4 2
10 600 P5 1