Home > Software design >  Reset match token only works every two characters
Reset match token only works every two characters

Time:06-29

I would like to add a full stop between each letter of a word using the reset match token (\K). I don't want to use a positive lookbehind.

gsub("\\pL\\K", ".", "adskhfks", perl = TRUE)

# [1] "a.ds.kh.fk.s"

For some reasons, that only works for every two letters (using RStudio 4.2.0 on macOS 12.31.). It works as expected on regex101.

Is this normal and is there a fix?

CodePudding user response:

This is a bug, and instead of your consuming pattern with \K operator, you will have to use a lookbehind based pattern:

gsub("(?<=\\p{L})", ".", "adskhfks", perl = TRUE)

The (?<=\p{L}) positive lookbehind matches a location that is immediately preceded with any Unicode letter.

Or, a capturing group with a backreference:

gsub("(\\p{L})", "\\1.", "adskhfks", perl = TRUE)

Here, (\p{L}) captures any Unicode letter into Group 1 and the \1. replacement puts back the Group 1 value and appends a dot to it.

See the R demo:

gsub("\\pL\\K", ".", "adskhfks", perl = TRUE)
# [1] "a.ds.kh.fk.s"
gsub("(?<=\\p{L})", ".", "adskhfks", perl = TRUE)
# [1] "a.d.s.k.h.f.k.s."
gsub("(\\pL)", "\\1.", "adskhfks", perl = TRUE)
# [1] "a.d.s.k.h.f.k.s."
  • Related