Home > Blockchain >  Grouping data frame by specific column value
Grouping data frame by specific column value

Time:10-24

I have a pandas data frame with a lot of columns. I want to create a new dataframe with just two columns. The first column should contain all values which appear in a specific column of the original data frame. The second column should contain all other data of the original data frame where the value of the first column matches.

For example my input data frame is structured like this:

    Name            Menu                City
0   Foo Burgers     Burgers and Fries   New York
1   Cheesy's        Cheeseburgers       New York
2   Buggy Burgers   Insect Burgers      London
3   Fry Guy         Fries               London
4   Beermania       Beer                Berlin

In code:

df = pd.DataFrame([["Foo Burgers", "Burgers and Fries", "New York"], 
                   ["Cheesy's", "Cheeseburgers", "New York"],
                   ["Buggy Burgers", "Insect Burgers", "London"],
                   ["Fry Guy", "Fries", "London"],
                   ["Beermania", "Beer", "Munich"]], columns=["Name","Menu","City"])

How can I easily convert the data frame to the following goal structure?

    City        Restaurants
0   New York    [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}]
1   London      [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}]
2   Munich      [{'Name': 'Beermania', 'Menu': 'Beer'}]

In code:

goal_df = pd.DataFrame([["New York", [{"Name": "Foo Burgers", "Menu": "Burgers and Fries"}, {"Name":"Cheesy's", "Menu": "Cheeseburgers"}], ],
                        ["London", [{"Name": "Buggy Burgers", "Menu": "Insect Burgers"}, {"Name":"Fry Guy", "Menu": "Fries"}], ],
                        ["Munich", [{"Name": "Beermania", "Menu": "Beer"}], ]], columns=["City", "Restaurants"])

CodePudding user response:

You can do a groupby().agg with to_dict:

(df.drop('City', axis=1).groupby(df['City'])
   .apply(lambda x: x.to_dict(orient='records'))
   .reset_index(name='Restaurants')
)

Output:

       City                                        Restaurants
0    London  [{'Name': 'Buggy Burgers', 'Menu': 'Insect Bur...
1    Munich            [{'Name': 'Beermania', 'Menu': 'Beer'}]
2  New York  [{'Name': 'Foo Burgers', 'Menu': 'Burgers and ...
  • Related