Home > Software design >  How to convert the special character when reading into a dataframe?
How to convert the special character when reading into a dataframe?

Time:04-29

I have a .csv file including the following data:

Publication,First Name,Last Name,Constituency,Caucus,Province,Date,Time,Page,Text
Hansard - 59,Fayçal,El-Khoury Pauzé,Laval-Les Îles,Lib.,QC,2022-04-27,14:23:08,,"Mr. Fayçal El-Khoury"

I want to be able to read correctlty the word with special character like Fayçal to be Fayçal. I tried :

import pandas as pd 

file_name = "C:/Users/Admin/Downloads/Results.csv"
df =pd.read_csv(file_name, sep=',', encoding='utf-8', encoding_errors='ignore')
df

But unfortunatly, still getting the strange character.

CodePudding user response:

Try reading the dataframe with encoding set to 'latin_1'

df =pd.read_csv(file_name, sep=',', encoding='latin_1', encoding_errors='ignore')

UPDATE:

Otherwise, if this does not solve the problem you could try apply encoding column by column.

Try this function should do the trick

def encode_serie(serie):
    return serie.str.encode('latin_1', errors='ignore').str.decode('utf-8', errors='ignore')

df = df.astype(str)
df = df.apply(lambda x:encode_serie(x))

If also this does not work, try to_csv with encoding set to 'utf-8' and try to read again the csv and rerun the function above.

  • Related