Home > Software design >  pandas divide groups by average of subgroup
pandas divide groups by average of subgroup

Time:03-13

I have the following dataframe

enter image description here

I am trying to divide the 'centroid velocity' per group, where each group is specified by a given ['Genotype', 'Fly'] pair, by the average 'centroid velocity' in the ['Genotype', 'Fly', 'Surface'='smooth'] subgroup.

So this is what I want to divide

fly_avg_vel = max_vel_centroid.groupby(['Genotype','Fly'])

and this is the divisor

fly_avg_vel = max_vel_centroid.groupby(['Genotype','Fly','Surface']).mean()
fly_avg_vel.loc[(slice(None),slice(None),'smooth'),:]

enter image description here

Could you please help?

CodePudding user response:

IIUC:

mean = (
    df.loc[df['Surface'] == 'smooth']
      .groupby(['Genotype', 'Fly'])['centroid_velocity']
      .mean().reindex(df[['Genotype', 'Fly']])
)

df['mean'] = df['centroid_velocity'] / mean.values
print(df)

# Output
    Genotype   Fly Surface  centroid_velocity      mean
0   SN16_TNT  Fly2  smooth          27.493325  1.018077
1    SN1_TNT  Fly3  smooth          26.220000  0.949446
2   SN16_TNT  Fly1   rough          32.987707  0.960055
3    SN1_TNT  Fly1  smooth          36.936724  1.000000
4    SN1_TNT  Fly2  smooth          32.023062  1.000000
5   SN16_TNT  Fly2   rough          34.133866  1.263976
6   SN16_TNT  Fly2   rough          27.382251  1.013964
7    SN1_TNT  Fly1   rough          30.802885  0.833937
8    SN1_TNT  Fly1   rough          40.623492  1.099813
9   SN16_TNT  Fly1  smooth          32.401770  0.943002
10  SN16_TNT  Fly1  smooth          36.318703  1.056998
11  SN16_TNT  Fly3  smooth          29.397323  0.944683
12  SN16_TNT  Fly2  smooth          26.516987  0.981923
13   SN1_TNT  Fly3  smooth          24.421761  0.884330
14  SN16_TNT  Fly3   rough          40.352241  1.296719
15   SN1_TNT  Fly3  smooth          27.658973  1.001552
16   SN1_TNT  Fly1   rough          30.818247  0.834352
17   SN1_TNT  Fly3  smooth          32.163673  1.164671
18  SN16_TNT  Fly3   rough          37.384199  1.201341
19  SN16_TNT  Fly3  smooth          32.840134  1.055317

Setup a MRE

import pandas as pd
import numpy as np

rng = np.random.default_rng(2022)
N = 20

data = {'Genotype': rng.choice(['SN1_TNT', 'SN16_TNT'], N),
        'Fly': rng.choice(['Fly1', 'Fly2', 'Fly3'], N),
        'Surface': rng.choice(['smooth', 'rough'], N), 
        'centroid_velocity': rng.normal(2, 4, N)   30}

df = pd.DataFrame(data)
  • Related