Hey guys I've been working on converting some json text I'm receiving from an API and I noticed some people using json_normalize, but in my case it doesn't solve the full issue and I was wondering if someone could help. import pandas as pd
my_json = [
{
"total": "null",
"items": [
{
"key": "time",
"label": "Time",
"value": "2022-12-13T23:59:59.939-07:00"
},
{
"key": "agentNotes",
"label": "Agent Notes",
"value": "null"
},
{
"key": "blindTransferToAgent",
"label": "Blind Transfer To Agent",
"value": "0"
}]},
{"total": "null",
"items": [
{
"key": "time",
"label": "Time",
"value": "2022-12-13T23:59:59.939-07:00"
},
{
"key": "agentNotes",
"label": "Agent Notes",
"value": "null"
},
{
"key": "blindTransferToAgent",
"label": "Blind Transfer To Agent",
"value": "0"
}
]}]
df = pd.json_normalize(my_json, ["items"])
print(df)
This gives me a result like this
key ... value 0 time ... 2022-12-13T23:59:59.939-07:00 1 agentNotes ... null 2 blindTransferToAgent ... 0
[3 rows x 3 columns]
But I'm trying to create my keys as columns and the values as the values so the end result look like this.
time agentNotes blindTransfertoAgent
2022-12-13T23:590:59.939-07:00 null 0
Any help would be appreciated.
CodePudding user response:
I did not find any shortcuts for this problem and maybe someone could enlighten us.
However the solution isn't that long, so I thought to post it anyways.
Your "JSON" isn't really a JSON if I am reading correctly from your question, it is a list that contains a dictionary with two keys, total
and items
. The value of items
is a list of dictionaries, so we can iterate through the values and take the key
-value
elements from each one:
from collections import defaultdict
import pandas as pd
dict_to_df = defaultdict(list)
dictionaries = [inner_dicts for items_dict in my_json for inner_dicts in items_dict['items']]
for dictionary in dictionaries:
dict_to_df[dictionary['key']].append(dictionary['value'])
df = pd.DataFrame.from_dict(dict_to_df, orient='index').T
print(df)
Which outputs:
time agentNotes blindTransferToAgent
0 2022-12-13T23:59:59.939-07:00 null 0
Explanations:
- Initialize an empty defaultdict (with default value of a list) which we will read to a pandas dataframe.
- Insert the values per
key
in the "JSON" we have - Read the dictionary into a pandas
DataFrame
- orienting the indices and transposing, if these values don't have the same length. (such as having one moreblindTransferToAgent
with value1
) to get around if the JSON looks like:
{
"key": "time",
"label": "Time",
"value": "2022-12-13T23:59:59.939-07:00"
},
{
"key": "agentNotes",
"label": "Agent Notes",
"value": "null"
},
{
"key": "blindTransferToAgent",
"label": "Blind Transfer To Agent",
"value": "0"
},
{
"key": "blindTransferToAgent",
"label": "Blind Transfer To Agent",
"value": "4"
}
Which will output:
time agentNotes blindTransferToAgent
0 2022-12-13T23:59:59.939-07:00 null 0
1 None None 4
CodePudding user response:
Try changing this:
df = pd.json_normalize(my_json, ["items"])
into this:
df = pd.json_normalize(my_json, ["items"]).T
The T
attribute in a Pandas DataFrame object stores a transposition of the index and columns, which is what you're looking for.
Output:
0 1 2
key time agentNotes blindTransferToAgent
label Time Agent Notes Blind Transfer To Agent
value 2022-12-13T23:59:59.939-07:00 null 0