Home > Software design >  Pandas: normalize values by group
Pandas: normalize values by group

Time:09-27

I find it hard to explain with words what I want to achieve, so please don't judge me for showing a simple example instead. I have a table that looks like this:

main_col some_metadata value
this True 10
this False 3
that True 50
that False 10
other True 20
other False 5

I want to normalize this data separately for each case of main_col. For example, if we're to choose min-max normalization and scale it to range [0; 100], I want the output to look like this:

main_col some_metadata value (normalized)
this True 100
this False 30
that True 100
that False 20
other True 100
other False 25

Where for each case of main_col, the highest value is scaled to 100 and another value is scaled in respective proportion.

CodePudding user response:

You can use groupby.transform('max') to get the max per group, then normalize in place:

df['value'] /= df.groupby('main_col')['value'].transform('max').div(100)

or:

df['value'] *= df.groupby('main_col')['value'].transform('max').rdiv(100)

output:

  main_col  some_metadata  value
0     this           True  100.0
1     this          False   30.0
2     that           True  100.0
3     that          False   20.0
4    other           True  100.0
5    other          False   25.0

CodePudding user response:

The normalization formula you are looking for is 100 * (x / x.max()):

df.groupby(['main_col'])['value'].transform(lambda x: 100 * (x / x.max()))

Result:

0    100.0
1     30.0
2    100.0
3     20.0
4    100.0
5     25.0
Name: value, dtype: float64
  • Related