(This question is based on a previous question Convert letters with duplicates to numbers)
I have series of events and non-events in column aoi
, with events expressed as capital letters and non-events expressed as "*":
df <- data.frame(
Partcpt = c("B","A","B","C","A","B"),
aoi = c("B*B*B","*A*C*A*C","*B*B","A*C","*A*","*")
)
I need to convert the letters to consecutive numbers unless they are duplicates, in which case the previous number should be repeated. This conversion is accomplished by this:
df$aoi_0 <- sapply(strsplit(df$aoi, split = ""), function(x) paste(match(x[x!="*"], unique(x[x!="*"])), collapse = ""))
df
Partcpt aoi aoi_0
1 B B*B*B 111
2 A *A*C*A*C 1212
3 B *B*B 11
4 C A*C 12
5 A *A* 1
6 B *
But now the information on the non-events is lost. How can I reinstate that information in the strings themselves, by re-inserting the "*" character where appropriate, like so:
df
Partcpt aoi aoi_0
1 B B*B*B 1*1*1
2 A *A*C*A*C *1*2*1*2
3 B *B*B *1*1
4 C A*C 1*2
5 A *A* *1*
6 B * *
CodePudding user response:
You can modify the anonymous function with an ifelse()
to return *
if the input is *
but otherwise to follow the logic of your previous code, i.e. match the input to the vector of unique values.
df$aoi_1 <- sapply(
strsplit(df$aoi, split = ""),
\(x) paste0(
ifelse(
x=="*",
"*",
match(x, unique(x[x!="*"]))
), collapse = ""
)
)
df
# Partcpt aoi aoi_0 aoi_1
# 1 B B*B*B 111 1*1*1
# 2 A *A*C*A*C 1212 *1*2*1*2
# 3 B *B*B 11 *1*1
# 4 C A*C 12 1*2
# 5 A *A* 1 *1*
# 6 B * *
CodePudding user response:
Another possible solution, which is based on the following ideas:
Try to match
*
withunique(x[x!="*"]
.This outcomes no match for
*
.Configure
nomatch = 0
.Use
gsub
to replace0
by*
.
df$aoi_0 <- sapply(strsplit(df$aoi, split = ""),
function(x) gsub("0", "*", paste(match(x, unique(x[x!="*"]), nomatch = 0),
collapse = "")))
df
#> Partcpt aoi aoi_0
#> 1 B B*B*B 1*1*1
#> 2 A *A*C*A*C *1*2*1*2
#> 3 B *B*B *1*1
#> 4 C A*C 1*2
#> 5 A *A* *1*
#> 6 B * *