Home > Net >  Pandas: transforming dataframe to nested dictionary
Pandas: transforming dataframe to nested dictionary

Time:12-01

I have this dataframe:

Month_Year  City_Name   Chain_Name Product_Name Product_Price
11-2021     London      Aldi       Pasta        2.33
11-2021     Bristol     Spar       Bananas      1.45
10-2021     London      Tesco      Olives       4.12
10-2021     Cardiff     Spar       Pasta        2.25

This dataframe will be displayed in nested collapsible which will expand in the following order:

Date_month:
    City_Name:
         Shop_name:
             Product_name : Price

Theefore, I want to transform the dataframe to the following structure:

{10-2021:
    {London:
        {Aldi:
            {Pasta:2.33}}},
    {Bristol:{
        {Spar:
            {Bananas:1.45}}}}

Essentially I want to groupby every element recursively, starting by purchase month. The closest thing I've found is the solution in this answer, but as dictionary doesn't take duplicate values as I might have duplicate values in every column. Other than that, I am not sure if using a dictionary is the optimal answer for this problem.

My best guess is I have to use other DS type such as some type of tree, but couldn't find any way to do that.

Any advises?

CodePudding user response:

You can group your dataframe by all columns except price, then create your dictionaries in a loop:

# if more than one price for one product in a chain, then calculate mean:
grouped_df = df.groupby(['Month_Year', 'City_Name', 'Chain_Name', 'Product_Name']).agg('mean')

result = dict()
nested_dict = dict()

for index, value in grouped_df.itertuples():
    for i, key in enumerate(index):
        if i == 0:
            if not key in result:
                result[key] = {}
            nested_dict = result[key]
        elif i == len(index) - 1:
            nested_dict[key] = value
        else:
            if not key in nested_dict:
                nested_dict[key] = {}
            nested_dict = nested_dict[key]

print(json.dumps(result, indent=4))

Changing your df to show nested dict and mean calculation to:

  Month_Year City_Name Chain_Name Product_Name  Product_Price
0    11-2021    London       Aldi        Pasta           2.33
1    11-2021    London       Aldi        Pasta           2.35
2    11-2021    London       Aldi       Olives           3.99
3    11-2021   Bristol       Spar      Bananas           1.45
4    10-2021    London      Tesco       Olives           4.12
5    10-2021   Cardiff       Spar        Pasta           2.25

You get the output:

{
    "10-2021": {
        "Cardiff": {
            "Spar": {
                "Pasta": 2.25
            }
        },
        "London": {
            "Tesco": {
                "Olives": 4.12
            }
        }
    },
    "11-2021": {
        "Bristol": {
            "Spar": {
                "Bananas": 1.45
            }
        },
        "London": {
            "Aldi": {
                "Olives": 3.99,
                "Pasta": 2.34
            }
        }
    }
}
  • Related