I have a character that looks like
It has an odd box looking character before the word 'info' contained in a column in a dataframe. I want to remove this. So far i have tried removing this by using a method to remove non ascii characters but it does not seem to work. Please help.
The code that i have tried are:
df['column_name']=df['column_name'].apply(lambda x : re.sub(r'[^\x00-\x7F]', '', x))
and
df['column_name']=df['column_name'].replace((r'[^\x00-\x7F]', '')
but it does not work
CodePudding user response:
Vectorize your function before applying it:
import re
import numpy as np
def removeNonAscii(s):
return re.sub(r'[^\x00-\x7f]', "", s)
df['column_name'] = df['column_name'].apply(np.vectorize(removeNonAscii))
CodePudding user response:
You can specify regex=True
and if you want inplace=True
and repeat the character class 1 or more times to replace consecutive non ASCII chars as one empty string.
df = pd.DataFrame(["aÀnÑ,!?'\\"], columns=["column_name"])
df['column_name'].replace(r'[^\x00-\x7F] ', '', inplace=True, regex=True)
print(df)
Output
column_name
0 an,!?'\