Home > OS >  R, algorithm to replace 0 with the surrounding numbers
R, algorithm to replace 0 with the surrounding numbers

Time:10-13

I am currently trying to set an efficient algorithm to replace 0 values with the surrounding numbers if similar in R. Here is a replication of my data:

ID <- c("FR01", "FR02", "FR03", "FR04")
String <- c("0000001000100100100100220002000200020011", "0222000000001000010101110020020002002022", "0000000000001000010101110020020002002022", "2002220002200202010002222222222222222222")
df <- data.frame(ID, String)
#Results:
result<-df %>% mutate(String=c("1111111111111111111100222222222222220011","2222000000001111111111110022222222222222","1111111111111111111111110022222222222222","2222222222222222010002222222222222222222"))


Id String
FR01 0000001000100100100100220002000200020011
FR02 0222000000001000010101110020020002002022
FR03 0000000000001000010101110020020002002022
FR04 2002220002200202010002222222222222222222

Condition to replace , for both ways:

  • if adjacent number is 0 check next number
  • if both adjacent numbers are the same replace by this number
  • if adjacent number are different keep 0 except for the start and the end of file where only 1 adjacent number is needed

Results needed

Id String
FR01 1111111111111111111100222222222222220011
FR02 2222000000001111111111110022222222222222
FR03 1111111111111111111111110022222222222222
FR04 2222222222222222010002222222222222222222

Anyone knows how to efficiently build algorithm to change those string numbers?

Thanks you for your help

CodePudding user response:

Here is something quick:

foo = \(x) {
  y  = unlist(strsplit(x, ""))
  ny = length(y)
  z  = gregexpr("0 ", x)[[1L]]
  if (z[1L] == -1L) return(x)
  for (i in seq_along(z)) {
    ml = attr(z, "match.length")[i]
    if      (i == 1L && ml < ny)       y[1L:ml]          = y[ml 1L]
    else if (z[i] ml > ny)             y[(ny-ml 1L):ny]  = y[ny-ml]
    else if (y[z[i]-1L] == y[z[i] ml]) y[z[i]:(z[i] ml)] = y[z[i] ml]
  }
  paste(y, collapse = "")
}

Example

df = data.frame(
  ID     = c("FR01", "FR02", "FR03"),
  String = c(
    "0000001000100100100100220002000200020010", 
    "0222000000001000010101110020020002002022", 
    "0000000000001000010101110020020002002022"
  )
)

df$result = sapply(df$String, foo)

#     ID                                   String                                   result
# 1 FR01 0000001000100100100100220002000200020010 1111111111111111111100222222222222220011
# 2 FR02 0222000000001000010101110020020002002022 2222000000001111111111110022222222222222
# 3 FR03 0000000000001000010101110020020002002022 1111111111111111111111110022222222222222
  • Related