I have this type of data:
df <- data.frame(
Partcpt = c("B","A","B","C"),
aoi = c("ACA","CB","AA","AABC" )
)
I want to replace the individual letters in aoi
with consecutive numbers unless the letters are duplicates, in which case the earlier replacement number should be repeated. Is there a regex solution to this? I'm open to other solutions as well.
The desired output is this:
Partcpt aoi
1 B 121
2 A 12
3 B 11
4 C 1123
CodePudding user response:
Here is a tidyverse solution:
The line that does the trick is mutate(ID = match(paste(aoi), unique(paste(aoi))))
-> after group for id we create unique ID for each unique aoi:
library(dplyr)
library(tidyr)
df %>%
mutate(id = row_number()) %>%
separate_rows(aoi, sep = "(?<!^)(?!$)") %>% #thanks to Chris Ruehlemann
#separate_rows(aoi, sep= "") %>% #alternative
#filter(aoi != "") %>% #alternative
group_by(id) %>%
mutate(ID = match(paste(aoi), unique(paste(aoi)))) %>%
mutate(ID = paste0(ID, collapse = "")) %>%
slice(1) %>%
ungroup() %>%
select(Partcpt, aoi=ID)
OR many thanks to @Henrik:
sapply(strsplit(df$aoi, split = ""), \(x) paste(match(x, unique(x)), collapse = ""))
Partcpt aoi
<chr> <chr>
1 B 121
2 A 12
3 B 11
4 C 1123
CodePudding user response:
A base R
option
df$aoi <- sapply(df$aoi, \(x) {
x <- as.integer(charToRaw(x))
paste(match(x, unique(x)), collapse = "")})
-output
> df$aoi
[1] "121" "12" "11" "1123"