Home > OS >  how to use json_normalize but flip the axis
how to use json_normalize but flip the axis

Time:01-19

Hey guys I've been working on converting some json text I'm receiving from an API and I noticed some people using json_normalize, but in my case it doesn't solve the full issue and I was wondering if someone could help. import pandas as pd

my_json = [
    {
        "total": "null",
        "items": [
            {
                "key": "time",
                "label": "Time",
                "value": "2022-12-13T23:59:59.939-07:00"
            },
            {
                "key": "agentNotes",
                "label": "Agent Notes",
                "value": "null"
            },
            {
                "key": "blindTransferToAgent",
                "label": "Blind Transfer To Agent",
                "value": "0"
            }]},
  {"total": "null",
        "items": [
            {
                "key": "time",
                "label": "Time",
                "value": "2022-12-13T23:59:59.939-07:00"
            },
            {
                "key": "agentNotes",
                "label": "Agent Notes",
                "value": "null"
            },
            {
                "key": "blindTransferToAgent",
                "label": "Blind Transfer To Agent",
                "value": "0"
            }
        ]}]
df = pd.json_normalize(my_json, ["items"])
print(df)

This gives me a result like this

               key  ...                          value
0                  time  ...  2022-12-13T23:59:59.939-07:00
1            agentNotes  ...                           null
2  blindTransferToAgent  ...                              0
[3 rows x 3 columns]

But I'm trying to create my keys as columns and the values as the values so the end result look like this.

time agentNotes blindTransfertoAgent

2022-12-13T23:590:59.939-07:00 null 0

Any help would be appreciated.

CodePudding user response:

I did not find any shortcuts for this problem and maybe someone could enlighten us.
However the solution isn't that long, so I thought to post it anyways.

Your "JSON" isn't really a JSON if I am reading correctly from your question, it is a list that contains a dictionary with two keys, total and items. The value of items is a list of dictionaries, so we can iterate through the values and take the key-value elements from each one:

from collections import defaultdict
import pandas as pd

dict_to_df = defaultdict(list)
dictionaries = [inner_dicts for items_dict in my_json for inner_dicts in items_dict['items']]

for dictionary in dictionaries:
    dict_to_df[dictionary['key']].append(dictionary['value'])

df = pd.DataFrame.from_dict(dict_to_df, orient='index').T
print(df)

Which outputs:

                            time agentNotes blindTransferToAgent
0  2022-12-13T23:59:59.939-07:00       null                    0

Explanations:
  • Initialize an empty defaultdict (with default value of a list) which we will read to a pandas dataframe.
  • Insert the values per key in the "JSON" we have
  • Read the dictionary into a pandas DataFrame - orienting the indices and transposing, if these values don't have the same length. (such as having one more blindTransferToAgent with value 1) to get around if the JSON looks like:
        {
            "key": "time",
            "label": "Time",
            "value": "2022-12-13T23:59:59.939-07:00"
        },
        {
            "key": "agentNotes",
            "label": "Agent Notes",
            "value": "null"
        },
        {
            "key": "blindTransferToAgent",
            "label": "Blind Transfer To Agent",
            "value": "0"
        },
        {
            "key": "blindTransferToAgent",
            "label": "Blind Transfer To Agent",
            "value": "4"
        }

Which will output:

                            time agentNotes blindTransferToAgent
0  2022-12-13T23:59:59.939-07:00       null                    0
1                           None       None                    4

CodePudding user response:

Try changing this:

df = pd.json_normalize(my_json, ["items"])

into this:

df = pd.json_normalize(my_json, ["items"]).T

The T attribute in a Pandas DataFrame object stores a transposition of the index and columns, which is what you're looking for.

Output:

                                   0            1                        2
key                             time   agentNotes     blindTransferToAgent
label                           Time  Agent Notes  Blind Transfer To Agent
value  2022-12-13T23:59:59.939-07:00         null                        0
  • Related