I have a large dataset with unique values for individuals that have some repeats. Each of these repeated rows are important data that I need to keep. However, I want to assign a unique ID to each individual.
Below is example raw data:
pit | let | ret | otherdata |
---|---|---|---|
19000 | 1 | 2 | xx |
19030 | 3 | 4 | sy |
19530 | 5 | 6 | gh |
19030 | 3 | 4 | sh |
19530 | 5 | 6 | lh |
What I want is:
pit | let | ret | ID | otherdata |
---|---|---|---|---|
19000 | 1 | 2 | 1 | xx |
19030 | 3 | 4 | 2 | sy |
19530 | 5 | 6 | 3 | gh |
19030 | 3 | 4 | 2 | sh |
19530 | 5 | 6 | 3 | lh |
I'm not sure how to get the unique number to be used for each set of the three columns and retain all the rows.
Thanks
CodePudding user response:
We could use
library(dplyr)
df1 %>%
group_by(pit, let, ret) %>%
mutate(ID = cur_group_id(), .before = 'otherdata') %>%
ungroup
-output
# A tibble: 5 × 5
pit let ret ID otherdata
<int> <int> <int> <int> <chr>
1 19000 1 2 1 xx
2 19030 3 4 2 sy
3 19530 5 6 3 gh
4 19030 3 4 2 sh
5 19530 5 6 3 lh
CodePudding user response:
Would it be possible to create the Unique IDs in a spreadsheet form or in SQL database using CONCAT or JOIN function before exporting for R analysis?
I am learning all of these applications at the moment so if anyone has anything to add or direct me on why that would not be the most efficient way please let me know. Thanks.