Encoding csv error with Pandas - have to encode one csv file but not the other -both have same encod-CodePudding

I am using pandas to read in csv data to my python script. Both csv files have the same encoding (Windows-1252). However with one of the files I get an error when reading the csv file with pandas, unless I specify the encoding parameters in pd.read_csv().

Does anyone know why I need to specify the encoding in one csv and not the other? Both csv's contain similar data (strings and numbers).

Thank you

CodePudding user response：

That just means that one of the files has a character outside the range 0x00 to 0x7F. It's only the highest 128 values where the encoding makes a difference. All it takes is one n-with-tilde or one smart quote mark.

CodePudding user response：

Pandas (at least version 1.3.3) uses UTF-8 encoding by default, even on Windows (see the source code). UTF-8 has some forbidden bytes (see the red cells in the codepage layout). However, these bytes are allowed in Windows-1252. Therefore, I suppose one of your files has some of these bytes that are not allowed in UTF-8. Perhaps there is a data entry error that mistakenly put a ø instead of 0.