Home > database >  How to assign new values to the id field in R
How to assign new values to the id field in R

Time:10-01

I have the following dataframe:

     id   children  age
1   AB22     2      31
2   AB25     1      27
3   AB25     1      25
4   AB25     3      33
5   AB50     2      40
6   AB35     1      37
7   AB35     1      39
8   AB40     1      41

I have to check if there are duplicate records in the id field. The idea is to assign a new id to the duplicate records (without repeating them), so that their numerical part has to be a value greater than the maximum numerical value of the id field. In the example, the maximum value is 50 (row 5). Therefore, the dataframe could be as follows (or any number greater than the maximum):

     id   children  age
1   AB22     2      31
2   AB25     1      27
3   AB51     1      25
4   AB52     3      33
5   AB50     2      40
6   AB35     1      37
7   AB53     1      39
8   AB40     1      41

How I could achieve it?

CodePudding user response:

Using dplyr and stringr :

data %>% 
  mutate(dupli = cumsum(as.numeric(duplicated(id)))) %>%
  mutate(id = ifelse(duplicated(id),
                     paste("AB",max(as.numeric(str_remove_all(id,"^AB")))   dupli,sep=""),
                     id)) %>% 
  select(-dupli)

Output:

    id children age
1 AB22        2  31
2 AB25        1  27
3 AB51        1  25
4 AB52        3  33
5 AB50        2  40
6 AB35        1  37
7 AB53        1  39
8 AB40        1  41

Data:

structure(list(id = c("AB22", "AB25", "AB25", "AB25", "AB50", 
"AB35", "AB35", "AB40"), children = c(2L, 1L, 1L, 3L, 2L, 1L, 
1L, 1L), age = c(31L, 27L, 25L, 33L, 40L, 37L, 39L, 41L)), row.names = c(NA, 
-8L), class = "data.frame")

CodePudding user response:

Try

df$id1=gsub("[^a-zA-Z]", "", df$id)
df$id2=as.numeric(gsub("\\D", "", df$id))

tmp=duplicated(df$id)

df[tmp,"id"]=paste0(df$id1[tmp],seq(max(df$id2) 1,max(df$id2) sum(tmp)))

    id children age id1 id2
1 AB22        2  31  AB  22
2 AB25        1  27  AB  25
3 AB51        1  25  AB  25
4 AB52        3  33  AB  25
5 AB50        2  40  AB  50
6 AB35        1  37  AB  35
7 AB53        1  39  AB  35
8 AB40        1  41  AB  40
  • Related