Home > Back-end >  assigning unique value to new column based on multiple columns/rows
assigning unique value to new column based on multiple columns/rows

Time:09-20

I have a large dataset with unique values for individuals that have some repeats. Each of these repeated rows are important data that I need to keep. However, I want to assign a unique ID to each individual.

Below is example raw data:

pit let ret otherdata
19000 1 2 xx
19030 3 4 sy
19530 5 6 gh
19030 3 4 sh
19530 5 6 lh

What I want is:

pit let ret ID otherdata
19000 1 2 1 xx
19030 3 4 2 sy
19530 5 6 3 gh
19030 3 4 2 sh
19530 5 6 3 lh

I'm not sure how to get the unique number to be used for each set of the three columns and retain all the rows.

Thanks

CodePudding user response:

We could use

library(dplyr)
df1 %>% 
  group_by(pit, let, ret) %>% 
  mutate(ID = cur_group_id(), .before = 'otherdata') %>% 
  ungroup

-output

# A tibble: 5 × 5
    pit   let   ret    ID otherdata
  <int> <int> <int> <int> <chr>    
1 19000     1     2     1 xx       
2 19030     3     4     2 sy       
3 19530     5     6     3 gh       
4 19030     3     4     2 sh       
5 19530     5     6     3 lh       

CodePudding user response:

Would it be possible to create the Unique IDs in a spreadsheet form or in SQL database using CONCAT or JOIN function before exporting for R analysis?

I am learning all of these applications at the moment so if anyone has anything to add or direct me on why that would not be the most efficient way please let me know. Thanks.

  • Related