Home > database >  Reinsert special character * into strings at predefined positions
Reinsert special character * into strings at predefined positions

Time:07-18

(This question is based on a previous question Convert letters with duplicates to numbers)

I have series of events and non-events in column aoi, with events expressed as capital letters and non-events expressed as "*":

df <- data.frame(
  Partcpt = c("B","A","B","C","A","B"),
  aoi = c("B*B*B","*A*C*A*C","*B*B","A*C","*A*","*")
)

I need to convert the letters to consecutive numbers unless they are duplicates, in which case the previous number should be repeated. This conversion is accomplished by this:

df$aoi_0 <- sapply(strsplit(df$aoi, split = ""), function(x) paste(match(x[x!="*"], unique(x[x!="*"])), collapse = ""))

df
  Partcpt      aoi aoi_0
1       B    B*B*B   111
2       A *A*C*A*C  1212
3       B     *B*B    11
4       C      A*C    12
5       A      *A*     1
6       B        *      

But now the information on the non-events is lost. How can I reinstate that information in the strings themselves, by re-inserting the "*" character where appropriate, like so:

df
  Partcpt      aoi      aoi_0
1       B    B*B*B      1*1*1
2       A *A*C*A*C   *1*2*1*2
3       B     *B*B       *1*1
4       C      A*C        1*2
5       A      *A*        *1*
6       B        *          *

CodePudding user response:

You can modify the anonymous function with an ifelse() to return * if the input is * but otherwise to follow the logic of your previous code, i.e. match the input to the vector of unique values.

df$aoi_1  <- sapply(
  strsplit(df$aoi, split = ""), 
  \(x) paste0(
    ifelse(
      x=="*", 
      "*", 
      match(x, unique(x[x!="*"]))
    ), collapse = ""
  )
)

df
#   Partcpt      aoi aoi_0    aoi_1
# 1       B    B*B*B   111    1*1*1
# 2       A *A*C*A*C  1212 *1*2*1*2
# 3       B     *B*B    11     *1*1
# 4       C      A*C    12      1*2
# 5       A      *A*     1      *1*
# 6       B        *              *

CodePudding user response:

Another possible solution, which is based on the following ideas:

  1. Try to match * with unique(x[x!="*"].

  2. This outcomes no match for *.

  3. Configure nomatch = 0.

  4. Use gsub to replace 0 by *.

df$aoi_0 <- sapply(strsplit(df$aoi, split = ""), 
  function(x) gsub("0", "*", paste(match(x, unique(x[x!="*"]), nomatch = 0),
    collapse = "")))

df

#>   Partcpt      aoi    aoi_0
#> 1       B    B*B*B    1*1*1
#> 2       A *A*C*A*C *1*2*1*2
#> 3       B     *B*B     *1*1
#> 4       C      A*C      1*2
#> 5       A      *A*      *1*
#> 6       B        *        *
  • Related