everyone!
I'm trying to save a pandas data frame to Postgresql, but I'm getting many encoding errors due to some "Latin 1" characters. So I tried to replace those characters using the following:
df = df.replace(r'\u2019 |\u2013', ' ', regex=True)
Although it's working, I would like a better way since I don't know how many of those characters are in the data frame. I noticed that all begin with \u2, so I tried using the code shown below:
df = df.replace(r'\\u[0-9]', ' ', regex-True)
The latter way doesn't work. Can you guys can give me tips on how to solve this problem.
Regards,
Marcio
CodePudding user response:
Use utf8 encoding when you reading from that file.
df = pd.read_csv('something.csv', encoding='utf8')
CodePudding user response:
Could you rely on a whitelist of (negated) good characters?
df = df.replace('[^a-zA-Z0-9 ] ', ' ', regex=True)