Home > database >  Convert letters with duplicates to numbers
Convert letters with duplicates to numbers

Time:07-18

I have this type of data:

df <- data.frame(
  Partcpt = c("B","A","B","C"),
  aoi = c("ACA","CB","AA","AABC" )
)

I want to replace the individual letters in aoi with consecutive numbers unless the letters are duplicates, in which case the earlier replacement number should be repeated. Is there a regex solution to this? I'm open to other solutions as well.

The desired output is this:

  Partcpt  aoi
1       B  121
2       A   12
3       B   11
4       C 1123

CodePudding user response:

Here is a tidyverse solution:

The line that does the trick is mutate(ID = match(paste(aoi), unique(paste(aoi)))) -> after group for id we create unique ID for each unique aoi:

library(dplyr)
library(tidyr)

df %>% 
  mutate(id = row_number()) %>% 
  separate_rows(aoi, sep = "(?<!^)(?!$)") %>% #thanks to Chris Ruehlemann
  #separate_rows(aoi, sep= "") %>% #alternative
  #filter(aoi != "") %>%  #alternative
  group_by(id) %>% 
  mutate(ID = match(paste(aoi), unique(paste(aoi)))) %>% 
  mutate(ID = paste0(ID, collapse = "")) %>% 
  slice(1) %>% 
  ungroup() %>% 
  select(Partcpt, aoi=ID)

OR many thanks to @Henrik:

sapply(strsplit(df$aoi, split = ""), \(x) paste(match(x, unique(x)), collapse = ""))
  Partcpt aoi  
  <chr>   <chr>
1 B       121  
2 A       12   
3 B       11   
4 C       1123 

CodePudding user response:

A base R option

df$aoi <- sapply(df$aoi, \(x) {
   x <- as.integer(charToRaw(x))
 paste(match(x, unique(x)), collapse = "")})

-output

> df$aoi
[1] "121"  "12"   "11"   "1123"
  • Related