here is the output of the json
[
{
"dx_code":"A000",
"formatted_code":"A00.0",
"valid_for_coding":"0.0",
"short_desc":null,
"long_desc":null,
"list_id":"Chronic_Body_Sys",
"option_id":"1",
"title":"Infectious and parasitic\u00a0"
},
{
"dx_code":"A00",
"formatted_code":"A00",
"valid_for_coding":0.0,
"short_desc":"Cholera",
"long_desc":"Cholera",
"list_id":"Chronic_Body_System",
"option_id":"1",
"title":"Infectious and parasitic disease\u00a0"
},
{
"dx_code":"A000",
"formatted_code":"A00.0",
"valid_for_coding":1.0,
"short_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
"long_desc":"Cholera due to Vibrio cholerae 01, biovar cholerae",
"list_id":"Chronic_Body_System",
"option_id":"1",
"title":"Infectious and parasitic disease\u00a0"
},
{
"dx_code":"A001",
"formatted_code":"A00.1",
"valid_for_coding":1.0,
"short_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
"long_desc":"Cholera due to Vibrio cholerae 01, biovar eltor",
"list_id":"Chronic_Body_System",
"option_id":"1",
"title":"Infectious and parasitic disease\u00a0"
}
}
this the code I used
testdata.to_json('testfile.json',indent=4,orient='records')
this \u00a0 character is not present in the data and I don't know how to remove it any suggestion for this code I was using jupyter notebook working on a dataframe
CodePudding user response:
Looking here, the 00a0
character is a no-break space. Using to_json
's force_ascii
should turn that to a normal \n
. Either way, deserializing (loading) this JSON should work just fine, as Python should know how to handle the character.
TL;DR
It is the unicode character for a space with no break, and is added in for formatting reasons. use force_ascii
if you want it gone, but reading this JSON should work just fine.
CodePudding user response:
You should be able to keep this character without issue.
If really you want to remove it, remember that to_json
returns a string, so you can use a simple:
s = df.to_json().replace('\u00a0', '')