while using df.to_json it created this character u00a0 in json how to remove in pandas dataframe-CodePudding

here is the output of the json

[
    {
        "dx_code":"A000",
        "formatted_code":"A00.0",
        "valid_for_coding":"0.0",
        "short_desc":null,
        "long_desc":null,
        "list_id":"Chronic_Body_Sys",
        "option_id":"1",
        "title":"Infectious and parasitic\u00a0"
    },
    {
        "dx_code":"A00",
        "formatted_code":"A00",
        "valid_for_coding":0.0,
        "short_desc":"Cholera",
        "long_desc":"Cholera",
        "list_id":"Chronic_Body_System",
        "option_id":"1",
        "title":"Infectious and parasitic disease\u00a0"
    },
    {
        "dx_code":"A000",
        "formatted_code":"A00.0",
        "valid_for_coding":1.0,
        "short_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
        "long_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
        "list_id":"Chronic_Body_System",
        "option_id":"1",
        "title":"Infectious and parasitic disease\u00a0"
    },
    {
        "dx_code":"A001",
        "formatted_code":"A00.1",
        "valid_for_coding":1.0,
        "short_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
        "long_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
        "list_id":"Chronic_Body_System",
        "option_id":"1",
        "title":"Infectious and parasitic disease\u00a0"
    }
}

this the code I used

testdata.to_json('testfile.json',indent=4,orient='records')

this \u00a0 character is not present in the data and I don't know how to remove it any suggestion for this code I was using jupyter notebook working on a dataframe

CodePudding user response：

Looking here, the 00a0 character is a no-break space. Using to_json's force_ascii should turn that to a normal \n. Either way, deserializing (loading) this JSON should work just fine, as Python should know how to handle the character.

TL;DR It is the unicode character for a space with no break, and is added in for formatting reasons. use force_ascii if you want it gone, but reading this JSON should work just fine.

CodePudding user response：

You should be able to keep this character without issue.

If really you want to remove it, remember that to_json returns a string, so you can use a simple:

s = df.to_json().replace('\u00a0', '')