Home > Blockchain >  How to flatten entries in a DataFrame that are similar?
How to flatten entries in a DataFrame that are similar?

Time:10-07

I have a DataFrame that contain entries that are similar with the exception of a few columns (that I derived from pd.get_dummies):


        name    stars   categories  Dietary_gluten-free_False   Dietary_gluten-free_True    Dietary_halal_False Dietary_kosher_False    Dietary_soy-free_False  Dietary_vegan_False Dietary_vegan_True  Dietary_vegetarian_False    Dietary_vegetarian_True Dietary__None   Dietary_dairy-free_False    Dietary_dairy-free_True
17660   Del Taco    3.0 Restaurants, Mexican, Tacos, Fast Food  0   0   0   0   0   0   0   0   0   0   1   0
17660   Del Taco    3.0 Restaurants, Mexican, Tacos, Fast Food  1   0   0   0   0   0   0   0   0   0   0   0
17660   Del Taco    3.0 Restaurants, Mexican, Tacos, Fast Food  0   0   0   0   0   0   1   0   0   0   0   0
17660   Del Taco    3.0 Restaurants, Mexican, Tacos, Fast Food  0   0   0   1   0   0   0   0   0   0   0   0
17660   Del Taco    3.0 Restaurants, Mexican, Tacos, Fast Food  0   0   1   0   0   0   0   0   0   0   0   0

How do I compress or flatten the entries, so that each actual business is one row, and all the dietary markers are "collected" together, like so:

        name    stars   categories  Dietary_gluten-free_False   Dietary_gluten-free_True    Dietary_halal_False Dietary_kosher_False    Dietary_soy-free_False  Dietary_vegan_False Dietary_vegan_True  Dietary_vegetarian_False    Dietary_vegetarian_True Dietary__None   Dietary_dairy-free_False    Dietary_dairy-free_True
17660   Del Taco    3.0 Restaurants, Mexican, Tacos, Fast Food  1   0   1   1   0   0   1   0   0   0   1   0

CodePudding user response:

Something like that should work

df.groupby(['name', 'stars', 'categories']).max().reset_index()
  • Related