Home > database >  Flattening Multi-Level Nested Object to DataFrame
Flattening Multi-Level Nested Object to DataFrame

Time:01-29

I am trying to convert an object/dictionary to a Python DataFrame using the following code:

sr = pd.Series(object)
df = pd.DataFrame(sr.values.tolist())
display(df)

It works well but some of the output columns are of object/dictionary type, and I would like to break them up to multiple columns, for example, if column "Items" produces the following value in a cell:

obj = {
    "item1": {
        "id": "item1",
        "relatedItems": [
            {
                "id": "1111",
                "category": "electronics"
            },
            {
                "id": "9999",
                "category": "electronics",
                "subcategory": "computers"
            },
            {
                "id": "2222",
                "category": "electronics",
                "subcategory": "computers",
                "additionalData": {
                    "createdBy": "Doron",
                    "inventory": 100
                }
            }
        ]
    },
    "item2": {
        "id": "item2",
        "relatedItems": [
            {
                "id": "4444",
                "category": "furniture",
                "subcategory": "sofas"
            },
            {
                "id": "5555",
                "category": "books",
            },
            {
                "id": "6666",
                "category": "electronics",
                "subcategory": "computers",
                "additionalData": {
                    "createdBy": "Joe",
                    "inventory": 5,
                    "condition": {
                        "name": "new",
                        "inspectedBy": "Doron"
                    }
                }
            }
        ]
    }
}

The desired output is: objResult

I tried using df.explode, but it multiplies the row to multiple rows, I am looking for a way to achieve the same but split into columns and retain a single row.

Any suggestions?

CodePudding user response:

You can use the pd.json_normalize function to flatten the nested dictionary into multiple columns, with the keys joined with a dot (.).

sr = pd.Series({
'Items': {
    'item_name': 'name',
    'item_value': 'value'
 }
})

df = pd.json_normalize(sr, sep='.')
display(df)

This will give you the following df

  Items.item_name Items.item_value
   0           name           value

You can also specify the level of nesting by passing the record_path parameter to pd.json_normalize, for example, to only flatten the 'Items' key:

df = pd.json_normalize(sr, 'Items', sep='.')
display(df)

CodePudding user response:

Seems like you're looking for pandas.json_normalize which has a (sep) parameter:​

obj = {
        'name': 'Doron Barel',
        'items': {
            'item_name': 'name',
            'item_value': 'value',
            'another_item_prop': [
                {
                'subitem1_name': 'just_another_name',
                'subitem1_value': 'just_another_value',
                },
                {
                'subitem2_name': 'one_more_name',
                'subitem2_value': 'one_more_value',
                }
                                  ]
                                 
                 }
       }
​
df = pd.json_normalize(obj, sep='.')
​
ser = df.pop('items.another_item_prop').explode()
​
out = (df.join(pd.DataFrame(ser.tolist(), index=s.index)
                .rename(columns= lambda x: ser.name "." x))
                .groupby("name", as_index=False).first()
            )

Output :

print(out)
​
          name items.item_name items.item_value items.another_item_prop.subitem1_name items.another_item_prop.subitem1_value items.another_item_prop.subitem2_name items.another_item_prop.subitem2_value
0  Doron Barel            name            value                     just_another_name                     just_another_value                         one_more_name                         one_more_value
  • Related