I have a DataFrame that contain entries that are similar with the exception of a few columns (that I derived from pd.get_dummies):
name stars categories Dietary_gluten-free_False Dietary_gluten-free_True Dietary_halal_False Dietary_kosher_False Dietary_soy-free_False Dietary_vegan_False Dietary_vegan_True Dietary_vegetarian_False Dietary_vegetarian_True Dietary__None Dietary_dairy-free_False Dietary_dairy-free_True
17660 Del Taco 3.0 Restaurants, Mexican, Tacos, Fast Food 0 0 0 0 0 0 0 0 0 0 1 0
17660 Del Taco 3.0 Restaurants, Mexican, Tacos, Fast Food 1 0 0 0 0 0 0 0 0 0 0 0
17660 Del Taco 3.0 Restaurants, Mexican, Tacos, Fast Food 0 0 0 0 0 0 1 0 0 0 0 0
17660 Del Taco 3.0 Restaurants, Mexican, Tacos, Fast Food 0 0 0 1 0 0 0 0 0 0 0 0
17660 Del Taco 3.0 Restaurants, Mexican, Tacos, Fast Food 0 0 1 0 0 0 0 0 0 0 0 0
How do I compress or flatten the entries, so that each actual business is one row, and all the dietary markers are "collected" together, like so:
name stars categories Dietary_gluten-free_False Dietary_gluten-free_True Dietary_halal_False Dietary_kosher_False Dietary_soy-free_False Dietary_vegan_False Dietary_vegan_True Dietary_vegetarian_False Dietary_vegetarian_True Dietary__None Dietary_dairy-free_False Dietary_dairy-free_True
17660 Del Taco 3.0 Restaurants, Mexican, Tacos, Fast Food 1 0 1 1 0 0 1 0 0 0 1 0
CodePudding user response:
Something like that should work
df.groupby(['name', 'stars', 'categories']).max().reset_index()