Home > Net >  Add column with certain value when doing groupby
Add column with certain value when doing groupby

Time:03-02

I have a DataFrame that looks something like:

df:

date                          price     bool
---------------------------------------------
2022-01-03 22:00:00 01:00     109.65    False
2022-01-03 22:00:00 01:00      80.00    False
2022-01-03 22:00:00 01:00      65.79    True
2022-01-03 22:00:00 01:00      50.00    True
2022-01-03 23:00:00 01:00      47.00    False
2022-01-03 23:00:00 01:00      39.95    True
2022-01-03 23:00:00 01:00      39.47    False
2022-01-03 23:00:00 01:00      29.96    False
2022-01-03 23:00:00 01:00      22.47    True

If I do a df.groupby("date") my output will be 2 groupby objects separated by date. This is fine. But what I would like is to add a new column to both of these with the max price where bool == True for the entire column. Hence, the resulting data frames would become:

df_groupby_object1:

date                          price     bool      max_price
-----------------------------------------------------------
2022-01-03 22:00:00 01:00     109.65    False      65.79
2022-01-03 22:00:00 01:00      80.00    False      65.79
2022-01-03 22:00:00 01:00      65.79    True       65.79
2022-01-03 22:00:00 01:00      50.00    True       65.79

df_groupby_object2:

date                           price     bool      max_price
-----------------------------------------------------------
2022-01-03 23:00:00 01:00      47.00    False      39.95
2022-01-03 23:00:00 01:00      39.95    True       39.95
2022-01-03 23:00:00 01:00      39.47    False      39.95
2022-01-03 23:00:00 01:00      29.96    False      39.95
2022-01-03 23:00:00 01:00      22.47    True       39.95

I could probably just iterate through the groupby objects as create a extra column that way, but I was wondering if this could be done directly in the groupby function ?

CodePudding user response:

Use GroupBy.transform for get maximal values only if Trues values in price. If not match price is NaN created by Series.where:

df['max_price'] = df['price'].where(df['bool']).groupby(df['date']).transform('max')

Details:

print (df['price'].where(df['bool']))
0      NaN
1      NaN
2    65.79
3    50.00
4      NaN
5    39.95
6      NaN
7      NaN
8    22.47
Name: price, dtype: float64

CodePudding user response:

You could filter df with the "bool" column, use groupby max to find the maximum price; then map it to "date":

df['max_price'] = df['date'].map(df[df['bool']].groupby('date')['price'].max())
x, y = [d for _,d in df.groupby('date')]

Output:

                        date   price   bool  max_price
0  2022-01-03 22:00:00 01:00  109.65  False      65.79
1  2022-01-03 22:00:00 01:00   80.00  False      65.79
2  2022-01-03 22:00:00 01:00   65.79   True      65.79
3  2022-01-03 22:00:00 01:00   50.00   True      65.79

                        date  price   bool  max_price
4  2022-01-03 23:00:00 01:00  47.00  False      39.95
5  2022-01-03 23:00:00 01:00  39.95   True      39.95
6  2022-01-03 23:00:00 01:00  39.47  False      39.95
7  2022-01-03 23:00:00 01:00  29.96  False      39.95
8  2022-01-03 23:00:00 01:00  22.47   True      39.95
  • Related