I have a DataFrame that looks something like:
df
:
date price bool
---------------------------------------------
2022-01-03 22:00:00 01:00 109.65 False
2022-01-03 22:00:00 01:00 80.00 False
2022-01-03 22:00:00 01:00 65.79 True
2022-01-03 22:00:00 01:00 50.00 True
2022-01-03 23:00:00 01:00 47.00 False
2022-01-03 23:00:00 01:00 39.95 True
2022-01-03 23:00:00 01:00 39.47 False
2022-01-03 23:00:00 01:00 29.96 False
2022-01-03 23:00:00 01:00 22.47 True
If I do a df.groupby("date")
my output will be 2 groupby
objects separated by date
. This is fine. But what I would like is to add a new column to both of these with the max
price
where bool == True
for the entire column. Hence, the resulting data frames would become:
df_groupby_object1
:
date price bool max_price
-----------------------------------------------------------
2022-01-03 22:00:00 01:00 109.65 False 65.79
2022-01-03 22:00:00 01:00 80.00 False 65.79
2022-01-03 22:00:00 01:00 65.79 True 65.79
2022-01-03 22:00:00 01:00 50.00 True 65.79
df_groupby_object2
:
date price bool max_price
-----------------------------------------------------------
2022-01-03 23:00:00 01:00 47.00 False 39.95
2022-01-03 23:00:00 01:00 39.95 True 39.95
2022-01-03 23:00:00 01:00 39.47 False 39.95
2022-01-03 23:00:00 01:00 29.96 False 39.95
2022-01-03 23:00:00 01:00 22.47 True 39.95
I could probably just iterate through the groupby
objects as create a extra column that way, but I was wondering if this could be done directly in the groupby
function ?
CodePudding user response:
Use GroupBy.transform
for get maximal values only if True
s values in price
. If not match price
is NaN
created by Series.where
:
df['max_price'] = df['price'].where(df['bool']).groupby(df['date']).transform('max')
Details:
print (df['price'].where(df['bool']))
0 NaN
1 NaN
2 65.79
3 50.00
4 NaN
5 39.95
6 NaN
7 NaN
8 22.47
Name: price, dtype: float64
CodePudding user response:
You could filter df
with the "bool" column, use groupby
max
to find the maximum price; then map
it to "date":
df['max_price'] = df['date'].map(df[df['bool']].groupby('date')['price'].max())
x, y = [d for _,d in df.groupby('date')]
Output:
date price bool max_price
0 2022-01-03 22:00:00 01:00 109.65 False 65.79
1 2022-01-03 22:00:00 01:00 80.00 False 65.79
2 2022-01-03 22:00:00 01:00 65.79 True 65.79
3 2022-01-03 22:00:00 01:00 50.00 True 65.79
date price bool max_price
4 2022-01-03 23:00:00 01:00 47.00 False 39.95
5 2022-01-03 23:00:00 01:00 39.95 True 39.95
6 2022-01-03 23:00:00 01:00 39.47 False 39.95
7 2022-01-03 23:00:00 01:00 29.96 False 39.95
8 2022-01-03 23:00:00 01:00 22.47 True 39.95