Home > OS >  Python equivalen to clean() function in Excel
Python equivalen to clean() function in Excel

Time:12-29

In Excel there is a function called clean(), which removes all nonprintable characters from text. Reference https://support.microsoft.com/en-us/office/clean-function-26f3d7c5-475f-4a9c-90e5-4b8ba987ba41#:~:text=Removes all nonprintable characters from,files and cannot be printed.

I am wondering if there is any direct function/method in python to achieve the same.

Also, how can I mimic clean() function in python just using Regular expression?

Any pointer will be very helpful

CodePudding user response:

The CLEAN function in Excel removes only "the first 32 nonprinting characters in the 7-bit ASCII code (values 0 through 31)", according to the documentation you link to, so to mimic it, you can filter characters of a given string whose ord values are less than 32:

def clean(s):
    return ''.join(c for c in s if ord(c) < 32)

Or you can use a regular expression substitution to remove characters with hex values between \x00 and \x1f:

import re

def clean(s):
    return re.sub(r'[\x00-\x1f] ', '', s)
  • Related