Home > database >  Encoding csv error with Pandas - have to encode one csv file but not the other -both have same encod
Encoding csv error with Pandas - have to encode one csv file but not the other -both have same encod

Time:09-25

I am using pandas to read in csv data to my python script. Both csv files have the same encoding (Windows-1252). However with one of the files I get an error when reading the csv file with pandas, unless I specify the encoding parameters in pd.read_csv().

Does anyone know why I need to specify the encoding in one csv and not the other? Both csv's contain similar data (strings and numbers).

Thank you

CodePudding user response:

That just means that one of the files has a character outside the range 0x00 to 0x7F. It's only the highest 128 values where the encoding makes a difference. All it takes is one n-with-tilde or one smart quote mark.

CodePudding user response:

Pandas (at least version 1.3.3) uses UTF-8 encoding by default, even on Windows (see the source code). UTF-8 has some forbidden bytes (see the red cells in the codepage layout). However, these bytes are allowed in Windows-1252. Therefore, I suppose one of your files has some of these bytes that are not allowed in UTF-8. Perhaps there is a data entry error that mistakenly put a ø instead of 0.

  • Related