In Excel there is a function called clean()
, which removes all nonprintable characters from text. Reference https://support.microsoft.com/en-us/office/clean-function-26f3d7c5-475f-4a9c-90e5-4b8ba987ba41#:~:text=Removes all nonprintable characters from,files and cannot be printed.
I am wondering if there is any direct function/method in python
to achieve the same.
Also, how can I mimic clean()
function in python
just using Regular expression
?
Any pointer will be very helpful
CodePudding user response:
The CLEAN
function in Excel removes only "the first 32 nonprinting characters in the 7-bit ASCII code (values 0 through 31)", according to the documentation you link to, so to mimic it, you can filter characters of a given string whose ord
values are less than 32:
def clean(s):
return ''.join(c for c in s if ord(c) < 32)
Or you can use a regular expression substitution to remove characters with hex values between \x00
and \x1f
:
import re
def clean(s):
return re.sub(r'[\x00-\x1f] ', '', s)