I am currently trying to set an efficient algorithm to replace 0 values with the surrounding numbers if similar in R. Here is a replication of my data:
ID <- c("FR01", "FR02", "FR03", "FR04")
String <- c("0000001000100100100100220002000200020011", "0222000000001000010101110020020002002022", "0000000000001000010101110020020002002022", "2002220002200202010002222222222222222222")
df <- data.frame(ID, String)
#Results:
result<-df %>% mutate(String=c("1111111111111111111100222222222222220011","2222000000001111111111110022222222222222","1111111111111111111111110022222222222222","2222222222222222010002222222222222222222"))
Id | String |
---|---|
FR01 | 0000001000100100100100220002000200020011 |
FR02 | 0222000000001000010101110020020002002022 |
FR03 | 0000000000001000010101110020020002002022 |
FR04 | 2002220002200202010002222222222222222222 |
Condition to replace , for both ways:
- if adjacent number is 0 check next number
- if both adjacent numbers are the same replace by this number
- if adjacent number are different keep 0 except for the start and the end of file where only 1 adjacent number is needed
Results needed
Id | String |
---|---|
FR01 | 1111111111111111111100222222222222220011 |
FR02 | 2222000000001111111111110022222222222222 |
FR03 | 1111111111111111111111110022222222222222 |
FR04 | 2222222222222222010002222222222222222222 |
Anyone knows how to efficiently build algorithm to change those string numbers?
Thanks you for your help
CodePudding user response:
Here is something quick:
foo = \(x) {
y = unlist(strsplit(x, ""))
ny = length(y)
z = gregexpr("0 ", x)[[1L]]
if (z[1L] == -1L) return(x)
for (i in seq_along(z)) {
ml = attr(z, "match.length")[i]
if (i == 1L && ml < ny) y[1L:ml] = y[ml 1L]
else if (z[i] ml > ny) y[(ny-ml 1L):ny] = y[ny-ml]
else if (y[z[i]-1L] == y[z[i] ml]) y[z[i]:(z[i] ml)] = y[z[i] ml]
}
paste(y, collapse = "")
}
Example
df = data.frame(
ID = c("FR01", "FR02", "FR03"),
String = c(
"0000001000100100100100220002000200020010",
"0222000000001000010101110020020002002022",
"0000000000001000010101110020020002002022"
)
)
df$result = sapply(df$String, foo)
# ID String result
# 1 FR01 0000001000100100100100220002000200020010 1111111111111111111100222222222222220011
# 2 FR02 0222000000001000010101110020020002002022 2222000000001111111111110022222222222222
# 3 FR03 0000000000001000010101110020020002002022 1111111111111111111111110022222222222222