I find it hard to explain with words what I want to achieve, so please don't judge me for showing a simple example instead. I have a table that looks like this:
main_col | some_metadata | value |
---|---|---|
this | True | 10 |
this | False | 3 |
that | True | 50 |
that | False | 10 |
other | True | 20 |
other | False | 5 |
I want to normalize this data separately for each case of main_col
. For example, if we're to choose min-max normalization and scale it to range [0; 100], I want the output to look like this:
main_col | some_metadata | value (normalized) |
---|---|---|
this | True | 100 |
this | False | 30 |
that | True | 100 |
that | False | 20 |
other | True | 100 |
other | False | 25 |
Where for each case of main_col
, the highest value is scaled to 100 and another value is scaled in respective proportion.
CodePudding user response:
You can use groupby.transform('max')
to get the max per group, then normalize in place:
df['value'] /= df.groupby('main_col')['value'].transform('max').div(100)
or:
df['value'] *= df.groupby('main_col')['value'].transform('max').rdiv(100)
output:
main_col some_metadata value
0 this True 100.0
1 this False 30.0
2 that True 100.0
3 that False 20.0
4 other True 100.0
5 other False 25.0
CodePudding user response:
The normalization formula you are looking for is 100 * (x / x.max())
:
df.groupby(['main_col'])['value'].transform(lambda x: 100 * (x / x.max()))
Result:
0 100.0
1 30.0
2 100.0
3 20.0
4 100.0
5 25.0
Name: value, dtype: float64