Home > Back-end >  Replace all values in DataFrame with minimum of GroupBy based on condition
Replace all values in DataFrame with minimum of GroupBy based on condition

Time:10-20

I have a DataFrame with several columns

df

    Time      | Feature   | Value   | Value Scaled  | Zscore   | ... | action
  2020-01-08      FA          1          0.1            1.2               0  
  2020-01-09      FA          10          1             1.4               1 
     ...          ...         ...        ...            ...              ... 
  2020-01-08      FB          2          0.01           1.1               0     
  2020-01-09      FB          2.4        0.05           1.3               1 
  2020-01-10      FB          2.6        0.03           1.5               0 
     ...         ...          ...        ....           ...              ...

For the columns [Value, Value Scaled, Zscore] where action == 1 I want to replace the values with the minimum of each column based on a GroupBy of feature

df.groupby('feature')[['Value','Value Scaled','Zscore']].min()

How would I do this?

CodePudding user response:

Use GroupBy.transform for repeated minimums per groups:

#setup data for see difference
print (df)
         Time Feature  Value  Value Scaled  Zscore  action
0  2020-01-08      FA    1.0          0.10     1.2       0
1  2020-01-09      FA   10.0          1.00     1.4       1
2  2020-01-08      FB    2.0          0.01     1.1       0
3  2020-01-09      FB    2.2          0.03     1.1       1
4  2020-01-09      FB    2.4          0.05     1.3       1
5  2020-01-10      FB    2.6          0.03     1.5       0

If need counts min per all rows per groups use:

m = df.action == 1
c = ['Value','Value Scaled','Zscore']
df.loc[m, c] = df[c].groupby(df['Feature']).transform('min')
print (df)
         Time Feature  Value  Value Scaled  Zscore  action
0  2020-01-08      FA    1.0          0.10     1.2       0
1  2020-01-09      FA    1.0          0.10     1.2       1
2  2020-01-08      FB    2.0          0.01     1.1       0
3  2020-01-09      FB    2.0          0.01     1.1       1
4  2020-01-09      FB    2.0          0.01     1.1       1
5  2020-01-10      FB    2.6          0.03     1.5       0

Or if need counts min by only rows with action==1 use:

m = df.action == 1
c = ['Value','Value Scaled','Zscore']
df.loc[m, c] = df.loc[m, c].groupby(df['Feature']).transform('min')
print (df)
         Time Feature  Value  Value Scaled  Zscore  action
0  2020-01-08      FA    1.0          0.10     1.2       0
1  2020-01-09      FA   10.0          1.00     1.4       1
2  2020-01-08      FB    2.0          0.01     1.1       0
3  2020-01-09      FB    2.2          0.03     1.1       1
4  2020-01-09      FB    2.2          0.03     1.1       1
5  2020-01-10      FB    2.6          0.03     1.5       0
  • Related