I need to parse a badly formatted JSON String in python.
Here is an example
"{""key1"":""value1"",""key2"":{""subkey1"":null,""subkey2"":{""subsubkey1"":9,""subsubkey2"":null,""subsubkey3"":null},""subkey3"":""strval1""},""key3"":""strval2"",""key4"":29}"
I've tried using json.loads
and pandas.read_json
to no avail. I have no control over creating the string - just need to parse it into a pandas DataFrame.
CodePudding user response:
If your file contains this string:
"{""key1"":""value1"",""key2"":{""subkey1"":null,""subkey2"":{""subsubkey1"":9,""subsubkey2"":null,""subsubkey3"":null},""subkey3"":""strval1""},""key3"":""strval2"",""key4"":29}"
then you can do for example:
import json
df = pd.read_csv("your_file.csv", header=None)
df_out = df[0].apply(json.loads)
# use read_json, or .apply(pd.Series) to convert the json to dataframe
df_out = pd.read_json(df.iloc[0, 0])
print(df_out)
Prints:
key1 key2 key3 key4
subkey1 value1 None strval2 29
subkey2 value1 {'subsubkey1': 9, 'subsubkey2': None, 'subsubkey3': None} strval2 29
subkey3 value1 strval1 strval2 29
CodePudding user response:
You can do it without pandas
if you need to. Just you can replace doubled up quotes ""
with a single double quote "
:
import json
j = '''
{""key1"":""value1"",""key2"":{""subkey1"":null,""subkey2"":{""subsubkey1"":9,""subsubkey2"":null,""subsubkey3"":null},""subkey3"":""strval1""},""key3"":""strval2"",""key4"":29}
'''
print(json.loads(j.replace('""', '"')))