Home > Blockchain >  How to replace all backslashes except those starting a Unicode escape sequence?
How to replace all backslashes except those starting a Unicode escape sequence?

Time:12-05

I am trying to remove the backslashes in my dataset \; however, a simple string.replace() method will remove even the escape unicode strings and I don't want that. I tried using re.sub("\\[^u]", " ", "\Not wanted backslashes\ unicode: \u2019\u2026"), but that also replaces the first character of the word.

Is there any way to only replace the backslash?

Thanks in advance

CodePudding user response:

Easy. Use negative lookahead:

\\(?!u)

This pattern will match any backslash NOT followed by a u. But you can do even better, with a negative lookahead for a Unicode escape pattern:

\\(?!u[0-9A-Fa-f]{4})

This pattern will match any backslash NOT followed by a u four hexadecimal digits.

To learn more: Positive & Negative Lookahead with Examples - Regex Tutorial

  • Related