Home > database >  Groupby column with multiple values
Groupby column with multiple values

Time:05-28

I have a dataframe that looks like this one (one column has multiple values, the other are just numbers with decimals):

food number
apple,tomato,melon 897.0
apple,meat,banana 984.9
banana,tomato 340.8

I want to get the average number of every food. In the example that'll be:

  • apple = (897.0 984.9)/2 = 940.95
  • banana = (984.9 340.8)/2 = 662.85

And so on to the point of ending up with a new dataframe with just the foods and the average number.

food average
apple 915.95
banana 662.85

I tried my luck with groupby, but the result is all messed up:

#reshape data
df = pd.DataFrame({
    'food' : list(chain.from_iterable(df.food.tolist())), 
    'number' : df.number.repeat(df.food.str.len())
})
# groupby
df.groupby('food').number.apply(lambda x: x.unique().tolist())

I must say that the original dataframe has over 100k rows. Thanks.

CodePudding user response:

Use enter image description here

If you would like more aggregations, you could use

df.explode('food').groupby('food').agg(['min', 'mean', 'max'])

enter image description here

  • Related