I would like to add a full stop between each letter of a word using the reset match token (\K
). I don't want to use a positive lookbehind.
gsub("\\pL\\K", ".", "adskhfks", perl = TRUE)
# [1] "a.ds.kh.fk.s"
For some reasons, that only works for every two letters (using RStudio 4.2.0 on macOS 12.31.). It works as expected on regex101.
Is this normal and is there a fix?
CodePudding user response:
This is a bug, and instead of your consuming pattern with \K
operator, you will have to use a lookbehind based pattern:
gsub("(?<=\\p{L})", ".", "adskhfks", perl = TRUE)
The (?<=\p{L})
positive lookbehind matches a location that is immediately preceded with any Unicode letter.
Or, a capturing group with a backreference:
gsub("(\\p{L})", "\\1.", "adskhfks", perl = TRUE)
Here, (\p{L})
captures any Unicode letter into Group 1 and the \1.
replacement puts back the Group 1 value and appends a dot to it.
See the R demo:
gsub("\\pL\\K", ".", "adskhfks", perl = TRUE)
# [1] "a.ds.kh.fk.s"
gsub("(?<=\\p{L})", ".", "adskhfks", perl = TRUE)
# [1] "a.d.s.k.h.f.k.s."
gsub("(\\pL)", "\\1.", "adskhfks", perl = TRUE)
# [1] "a.d.s.k.h.f.k.s."