Pandas: normalize values by group-CodePudding

I find it hard to explain with words what I want to achieve, so please don't judge me for showing a simple example instead. I have a table that looks like this:

main_col	some_metadata	value
this	True	10
this	False	3
that	True	50
that	False	10
other	True	20
other	False	5

I want to normalize this data separately for each case of main_col. For example, if we're to choose min-max normalization and scale it to range [0; 100], I want the output to look like this:

main_col	some_metadata	value (normalized)
this	True	100
this	False	30
that	True	100
that	False	20
other	True	100
other	False	25

Where for each case of main_col, the highest value is scaled to 100 and another value is scaled in respective proportion.

CodePudding user response：

You can use groupby.transform('max') to get the max per group, then normalize in place:

df['value'] /= df.groupby('main_col')['value'].transform('max').div(100)

or:

df['value'] *= df.groupby('main_col')['value'].transform('max').rdiv(100)

output:

  main_col  some_metadata  value
0     this           True  100.0
1     this          False   30.0
2     that           True  100.0
3     that          False   20.0
4    other           True  100.0
5    other          False   25.0

CodePudding user response：

The normalization formula you are looking for is 100 * (x / x.max()):

df.groupby(['main_col'])['value'].transform(lambda x: 100 * (x / x.max()))

Result:

0    100.0
1     30.0
2    100.0
3     20.0
4    100.0
5     25.0
Name: value, dtype: float64