Home > Software design >  while using df.to_json it created this character u00a0 in json how to remove in pandas dataframe
while using df.to_json it created this character u00a0 in json how to remove in pandas dataframe

Time:05-17

here is the output of the json

[
    {
        "dx_code":"A000",
        "formatted_code":"A00.0",
        "valid_for_coding":"0.0",
        "short_desc":null,
        "long_desc":null,
        "list_id":"Chronic_Body_Sys",
        "option_id":"1",
        "title":"Infectious and parasitic\u00a0"
    },
    {
        "dx_code":"A00",
        "formatted_code":"A00",
        "valid_for_coding":0.0,
        "short_desc":"Cholera",
        "long_desc":"Cholera",
        "list_id":"Chronic_Body_System",
        "option_id":"1",
        "title":"Infectious and parasitic disease\u00a0"
    },
    {
        "dx_code":"A000",
        "formatted_code":"A00.0",
        "valid_for_coding":1.0,
        "short_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
        "long_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
        "list_id":"Chronic_Body_System",
        "option_id":"1",
        "title":"Infectious and parasitic disease\u00a0"
    },
    {
        "dx_code":"A001",
        "formatted_code":"A00.1",
        "valid_for_coding":1.0,
        "short_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
        "long_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
        "list_id":"Chronic_Body_System",
        "option_id":"1",
        "title":"Infectious and parasitic disease\u00a0"
    }
}

this the code I used

testdata.to_json('testfile.json',indent=4,orient='records')

this \u00a0 character is not present in the data and I don't know how to remove it any suggestion for this code I was using jupyter notebook working on a dataframe

CodePudding user response:

Looking here, the 00a0 character is a no-break space. Using to_json's force_ascii should turn that to a normal \n. Either way, deserializing (loading) this JSON should work just fine, as Python should know how to handle the character.

TL;DR It is the unicode character for a space with no break, and is added in for formatting reasons. use force_ascii if you want it gone, but reading this JSON should work just fine.

CodePudding user response:

You should be able to keep this character without issue.

If really you want to remove it, remember that to_json returns a string, so you can use a simple:

s = df.to_json().replace('\u00a0', '')
  • Related