Home > Net >  How to assign a numeric value (in new column) based on groupings of other columns R
How to assign a numeric value (in new column) based on groupings of other columns R

Time:11-30

I would like to assign each unique combination of variables a value and list those values in a new column called ID, as shown below. For example I would like patients who are Ta cancer, N0 lymph, and 1 immunotherapy ID'd as 1. Patients who are TA, NX, and 1 as ID 2 and so on... Below is a table of what the data looks like before, and what I would like it to look like as after. Data was loaded from .csv

So to summarize: 
Patients TA, N0, 1 ID = 1
Patients TA, N0, 2 ID = 2 
Patients TA, Nx, 0 ID = 3
Patients TA, Nx, 1 ID = 4
Patients TA, N0, 0 ID = 5
Patients TA, Nx, 2 ID = 6 

Before:

| Cancer   | Lymph    |Immunotherapy
| -------- | -------- |---------    
| TA       |  N0      |1           
| TA       |  N0      |2
| TA       |  N0      |1            
| TA       |  Nx      |0            
| TA       |  Nx      |1            
| TA       |  N0      |0 
| TA       |  Nx      |1            
| TA       |  Nx      |2       

After:


| Cancer   | Lymph    |Immunotherapy|ID
| -------- | -------- |---------    |-------
| TA       |  N0      |1            | 1
| TA       |  N0      |2            | 2
| TA       |  N0      |1            | 1
| TA       |  Nx      |0            | 3
| TA       |  Nx      |1            | 4
| TA       |  N0      |0            | 5
| TA       |  Nx      |1            | 4
| TA       |  Nx      |2            | 6

I attempted to use group_by() dplyr and mutate with no luck. Any help would be much appreciated. Thanks!

CodePudding user response:

in Base R:

d <- do.call(paste, df)
cbind(df, id = as.numeric(factor(d, unique(d))))

 Cancer Lymph Immunotherapy id
1     TA    N0             1  1
2     TA    N0             2  2
3     TA    N0             1  1
4     TA    Nx             0  3
5     TA    Nx             1  4
6     TA    N0             0  5
7     TA    Nx             1  4
8     TA    Nx             2  6

CodePudding user response:

library(dplyr)
df %>%
  group_by(Cancer, Lymph, Immunotherapy) %>%
  mutate(ID = cur_group_id()) %>%
  ungroup()

alternatively:

df %>%
  left_join(df %>% 
     distinct(Cancer,Lymph,Immunotherapy) %>% 
     mutate(ID = row_number())
  )
  • Related