Chinese characters not showing up properly and changing JSON-CodePudding

While writing a program to help myself study, I run into a problem with my program not displaying the Chinese characters properly.

The Chinese characters are loaded in from a .JSON file, and are then printed using a python program.

The JSON entries look like this.

{
  "symbol": "我",
  "reading": "wo",
  "meaning": "I",
  "streak": 0
},

The output in the console looks like this VS Code console output

And once the program has finished and dumps the info pack into the JSON, it looks like this.

{
  "symbol": "\u00e6\u02c6\u2018",
  "reading": "wo",
  "meaning": "I",
  "streak": 0
}

Changing Language for non-Unicode programs to Chinese (simplified) didn't fix.

Using chcp 936 didn't fix the issue.

The program is not a .py file that is not being hosted online. The IDE is Visual Studio code.

The program for the python file is

 import json


#Import JSON file as an object
with open('cards.json') as f:
 data = json.load(f) 


def main():
    for card in data['cards']:

     #Store Current meaning reading and kanji in a local varialbe
     currentKanji = card['symbol']
     currentReading = card['reading']
     currentMeaning = card['meaning']

     #Ask the user the meaning of the kanji
     inputMeaning = input(f'What is the meaning of {currentKanji}\n')

     #Check if The user's answer is correct
     if inputMeaning == currentMeaning:
        print("Was correct")
     else:
         print("Was incorrect")
        
     #Ask the User the reading of the kanji
     inputReading = input(f'What is the reading of {currentKanji}\n')
     #Check if the User's input is correct
     if inputReading == currentReading:
         print("Was Correct")
     else:
         print("Was incorrect")
     
     #If both Answers correct, update the streak by one
     if (inputMeaning == currentMeaning) and (inputReading == currentReading):
         card['streak'] = card['streak']   1
     print(card['streak'])
     #If one of the answers is incorrect, decrease the streak by one

     if not (inputMeaning == currentMeaning) or not (inputReading == currentReading):
      card['streak'] = card['streak'] - 1


    main()

    #Reopen the JSON file an write new info into it.
    with open('cards.json', 'w') as f:
      json.dump(data,f,indent=2)

CodePudding user response：

json.dumps takes a parameter called ensure_ascii that needs to be set to False.

with open('cards.json', 'w') as f:
  json.dump(data,f,indent=2,ensure_ascii=False)

When I do the following:

import json

data = {
  "symbol": "我",
  "reading": "wo",
  "meaning": "I",
  "streak": 0
}

print(data['symbol'])


#Reopen the JSON file an write new info into it.
with open('cards.json', 'w', encoding='utf-8') as f:
  json.dump(data, f, indent=2, ensure_ascii=False)

print( list(open('cards.json').readlines()) )

It correctly saves Chinese characters in cards.json.

For the other issue with Chinese not showing up in the console, they show up just fine for me when I run the script in a Linux terminal. I don't use VS code so I won't be able to be as much of a help here. Perhaps this question might be able to help you out.

CodePudding user response：

I think there are two problems with your code that are leading you to getting mojibake on your screen, and escaped nonsense in your file.

The first issue is some kind of encoding mismatch between your file and your program, or between your program and your console. I think it's the former, but I'm not sure. The best way to fix this is to specify the encoding you want to be using when you open your file at the beginning and end of the program, rather than using the default (which may not be what you expect).

Change:

with open('cards.json') as f:
    data = json.load(f)

To:

with open('cards.json', encoding="utf-8") as f:  # specify whatever actual
    data = json.load(f)                          # encoding you're using

And do a similar change when you open the file to rewrite the contents at the end.

The second issue is that non-ASCII characters are not making the round trip from JSON into your program and then back to JSON. This problem isn't as big of an issue as the encoding problem, because an encoded JSON string will decode to the right character (if the character was correctly read in the first place, which due to the encoding issue above, it was not). That is, if you only fix the encoding issue, you might end up with "\u6211" in your JSON, which correctly decodes to "我" when you load the file again.

But if you want the Chinese character to be human-readable in the file, you just need to tell Python's JSON module not to escape it. Just pass False as the value of the ensure_ascii argument to json.dump:

with open('cards.json', 'w', encoding="utf-8") as f:  # encoding fix, from above
    json.dump(data, f, indent=2, ensure_ascii=False)  # new fix, to avoid escapes