I am trying to remove the backslashes in my dataset \
; however, a simple string.replace()
method will remove even the escape unicode strings and I don't want that. I tried using re.sub("\\[^u]", " ", "\Not wanted backslashes\ unicode: \u2019\u2026")
, but that also replaces the first character of the word.
Is there any way to only replace the backslash?
Thanks in advance
CodePudding user response:
Easy. Use negative lookahead:
\\(?!u)
This pattern will match any backslash NOT followed by a u
. But you can do even better, with a negative lookahead for a Unicode escape pattern:
\\(?!u[0-9A-Fa-f]{4})
This pattern will match any backslash NOT followed by a u
four hexadecimal digits.
To learn more: Positive & Negative Lookahead with Examples - Regex Tutorial