Home > database >  Build a dataframe from uneven nested list of dictionary key value pairs and json
Build a dataframe from uneven nested list of dictionary key value pairs and json

Time:10-05

Sorry to bother u guys again. Below is my Json output (which I have put into a variable for convenience) from Azure form recognizer extract :

Service_Table =  {
                        
                        "valueArray": [{"type": "object",
                                   "valueObject": {
                                    "Minimum Connections": {
                                        "type": "string",
                                        "valueString": "290",
                                        "content": "290"},
                                    "Plan name": {
                                        "type": "string",
                                        "valueString": "Endless Plan",
                                        "content": "Endless Plan"}}},   
                                {"type": "object",
                                "valueObject": {
                                    "Plan name": {
                                        "type": "string",
                                        "valueString": "Corporate Plan",
                                        "content": "Corporate Plan"}}}]}

As you can see it is a nested list containing key value pairs. I need to get the dataframe out of this as follows:

enter image description here

I've tried many things such as pd.DataFrame(ServiceTable), pd.json_normalize(ServiceTable) other list comprehension methods etc but not able to get what I need, because the key value pairs are uneven (inside the first "valueOject" you can see two keys while in the second one there is only one key). Any help here is really appreciated and thank you for your time.

For more clarity the key "valueObject" could be accessed by calling Service['valueArray'][i] where i is the index (there are two keys with name valueObject)

CodePudding user response:

You can use pandas.json_normalize :

df = pd.json_normalize(Service_Table['valueArray'])
df = df.loc[:,df.columns.str.endswith('content')]

df.columns = range(df.columns.size)

df.insert(0, 'S.no', df.index 1)
df = df.rename(columns= {0: 'Minimum connections', 1: 'Plan name'})

# Output :

print(df)

   S.no Minimum connections       Plan name
0     1                 290    Endless Plan
1     2                 NaN  Corporate Plan

CodePudding user response:

I would prepare a dictionary first:

content = [
    (values.get('Minimum Connections', dict()).get('content'),
     values.get('Plan name', dict()).get('content'))
    for xs in Service_Table.get('valueArray', [])
    for key, values in xs.items()
    if key == 'valueObject']

rows = {i: (i   1,)   content[i] for i in range(len(content))}

Then create a DataFrame from with from_dict:

(pd.DataFrame
 .from_dict(rows,
            orient='index',
            columns=['S.no', 'Minimum Connections', 'Plan name'])
 .astype({'Minimum Connections': 'Int32'}))

The result table:

   S.no  Minimum Connections       Plan name
0     1                  290    Endless Plan
1     2                 <NA>  Corporate Plan

Note: I would suggest to remove spaces from the column name.

  • Related