Home > OS >  Keep getting encoding errors on 3.10.4 if base encoding is not defined
Keep getting encoding errors on 3.10.4 if base encoding is not defined

Time:04-05

Watching some youtube tutorial. Person using ver 3.8.2, and i installed 3.10.4. He type smth like this and it works just fine:

r = open('file.txt', 'a')
r.write('something'   '\n')
r.write('что-то')
r.close()

If i do the same, i get UnicodeEncodeError

 Traceback (most recent call last): File "C:\Users\small\Desktop\test.py", line 3, in <module> r.write('что-то') File "C:\Python310\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined> 

and forced to declare encoding for opening file like this:

r = open('file.txt', 'a', encoding='utf-8')
r.write('something'   '\n')
r.write('что-то')
r.close()

Mainly interested in 2 questions:

  1. Is this happening bc of the difference of OS versions (i got latest win10) or python version or smth else maybe?
  2. Is there are a way to fix this permanently? I thought about declaring encoding type at the start of the program but then it will become inflexible in terms of getting strings from different sources if they are not in the base encoding type. In this case i will be forced to make tons of checks for encoding type and converting it to the unicode-8, for example. This solution not looks like the right one.

CodePudding user response:

The default encoding for the open function is platform dependent:

On Unix, it is the encoding of the LC_CTYPElocale. It can be set with locale.setlocale(locale.LC_CTYPE, new_locale).

On Windows, it is the ANSI code page (ex: cp1252).

So yes, it's because of the OS differences. It is a good habit to always specify encoding for writing platform independent code.

You can also make it permanent by enabling the Python UTF-8 mode.

  • Related