I need to convert a JSON dictionary to a Pandas DataFrame, but the embedding is tripping me up.
Here is basically what the JSON dict looks like.
{
"report": "{'name':{'data':[{'key 1':'value 1','key 2':'value 2'},{'key 1':'value 1','key 2':'value 2'}]}}"
}
In the DataFrame, I want the keys to be the column headers and values in the rows below them.
The extra layer of embedding is throwing me off somewhat from all the usual methods of doing this.
One tricky part is 'name' will change each time I get this JSON dict, so I can't use an exact sting value for 'name'.
CodePudding user response:
Your JSON looks a bit odd. It looks more like a Python dict converted to a string, so you can use ast.literal_eval
(a built-in function) to convert it to a real dict, and then use pd.json_normalize
to get it into a dataframe form:
import ast
j = ...
parsed_json = ast.literal_eval(j['report'])
df = pd.json_normalize(parsed_json, record_path=[list(parsed_json)[0], 'data'])
Output:
>>> df
key 1 key 2
0 value 1 value 2
1 value 1 value 2
CodePudding user response:
The error suggests that you're trying to index the strings (because the value under report
is a string) using another string.
You just need ast.literal_eval
to parse the string and a DataFrame constructor. If the "name" is unknown, you can iterate over the dict.values
after you parse the string.
import ast
out = pd.DataFrame(y for x in ast.literal_eval(your_data['report']).values() for y in x['data'])
Output:
key 1 key 2
0 value 1 value 2
1 value 1 value 2