I would like to assign each unique combination of variables a value and list those values in a new column called ID, as shown below. For example I would like patients who are Ta cancer, N0 lymph, and 1 immunotherapy ID'd as 1. Patients who are TA, NX, and 1 as ID 2 and so on... Below is a table of what the data looks like before, and what I would like it to look like as after. Data was loaded from .csv
So to summarize:
Patients TA, N0, 1 ID = 1
Patients TA, N0, 2 ID = 2
Patients TA, Nx, 0 ID = 3
Patients TA, Nx, 1 ID = 4
Patients TA, N0, 0 ID = 5
Patients TA, Nx, 2 ID = 6
Before:
| Cancer | Lymph |Immunotherapy
| -------- | -------- |---------
| TA | N0 |1
| TA | N0 |2
| TA | N0 |1
| TA | Nx |0
| TA | Nx |1
| TA | N0 |0
| TA | Nx |1
| TA | Nx |2
After:
| Cancer | Lymph |Immunotherapy|ID
| -------- | -------- |--------- |-------
| TA | N0 |1 | 1
| TA | N0 |2 | 2
| TA | N0 |1 | 1
| TA | Nx |0 | 3
| TA | Nx |1 | 4
| TA | N0 |0 | 5
| TA | Nx |1 | 4
| TA | Nx |2 | 6
I attempted to use group_by() dplyr and mutate with no luck. Any help would be much appreciated. Thanks!
CodePudding user response:
in Base R:
d <- do.call(paste, df)
cbind(df, id = as.numeric(factor(d, unique(d))))
Cancer Lymph Immunotherapy id
1 TA N0 1 1
2 TA N0 2 2
3 TA N0 1 1
4 TA Nx 0 3
5 TA Nx 1 4
6 TA N0 0 5
7 TA Nx 1 4
8 TA Nx 2 6
CodePudding user response:
library(dplyr)
df %>%
group_by(Cancer, Lymph, Immunotherapy) %>%
mutate(ID = cur_group_id()) %>%
ungroup()
alternatively:
df %>%
left_join(df %>%
distinct(Cancer,Lymph,Immunotherapy) %>%
mutate(ID = row_number())
)