I have a .csv file including the following data:
Publication,First Name,Last Name,Constituency,Caucus,Province,Date,Time,Page,Text
Hansard - 59,Fayçal,El-Khoury Pauzé,Laval-Les Îles,Lib.,QC,2022-04-27,14:23:08,,"Mr. Fayçal El-Khoury"
I want to be able to read correctlty the word with special character like Fayçal
to be Fayçal
.
I tried :
import pandas as pd
file_name = "C:/Users/Admin/Downloads/Results.csv"
df =pd.read_csv(file_name, sep=',', encoding='utf-8', encoding_errors='ignore')
df
But unfortunatly, still getting the strange character.
CodePudding user response:
Try reading the dataframe with encoding set to 'latin_1'
df =pd.read_csv(file_name, sep=',', encoding='latin_1', encoding_errors='ignore')
UPDATE:
Otherwise, if this does not solve the problem you could try apply encoding column by column.
Try this function should do the trick
def encode_serie(serie):
return serie.str.encode('latin_1', errors='ignore').str.decode('utf-8', errors='ignore')
df = df.astype(str)
df = df.apply(lambda x:encode_serie(x))
If also this does not work, try to_csv
with encoding set to 'utf-8' and try to read again the csv and rerun the function above.