how to use python re.sub()?-CodePudding

import re

re.sub('[a-zA-Z0-9/*\n\u]', '', string='\n\u3000\u3000xyz')

error:

  File "<input>", line 2
    re.sub('[a-zA-Z0-9/*\n\u]', '', string='\n\u3000\u3000xyz')
                              ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 14-15: truncated \uXXXX escape

I want to delete '\u' in string'\n\u3000\u3000xyz', but it didn't work.

CodePudding user response：

As @Akax stated "\u]" is an invalid bit of Python since \u is the escape character for an Unicode code. what you can do is say to python it is a raw string by adding prefix r in the re.sub as follows.

import re

re.sub(r'[a-zA-Z0-9/*\n\\u]', '', string='\n\u3000\u3000xyz')

Note: if we using a raw string then \u should be chnaged to ---> \\u

CodePudding user response：

Since \u is an escape character in python, you will have to convert the matching pattern and input string into raw string by putting r before your string.

import re
re.sub(r'\\u','',r'\n\u3000\u3000xyz')

Output -

\\n30003000xyz

But this as you can see is a raw string and expected output should be \n30003000xyz. Hence you'll have to convert it back to normal string.

import re
import codecs
codecs.decode(re.sub(r'\\u','',r'\n\u3000\u3000xyz'),'unicode_escape')

Result -

\n30003000xyz