Given this pandas dataframe with three columns, 'room_id', 'temperature' and 'State'. How do I get a forth column 'Max' indicating wehn the value is a maximum for each interval where State is True and for each room ?
117 1.489000 True
8.9 False
2.5 False
4.370000 False
4.363333 True
4.356667 True
118 4.35 True
6.648000 True
6.642667 True
7.3 False
9.4 False
5.3 True
7.1 True
What I am expecting
117 1.489000 True max
8.9 False
2.5 False
4.370000 False
4.363333 True max
4.356667 True
118 4.35 True
6.648000 True max
6.642667 True
7.3 False
9.4 False
5.3 True
7.1 True max
I used this :
Max = df_state.groupby(masque.cumsum()[~masque])['temperature'].agg(['idxmax'])
But I found this :
117 1.489000 True max
8.9 False
2.5 False
4.370000 False
4.363333 True
4.356667 True
118 4.35 True
6.648000 True max
6.642667 True
7.3 False
9.4 False
5.3 True
7.1 True max
I miss the last max of room 117 because the algorithm does not take into account the room id
CodePudding user response:
You can use groupby.idxmax
to get the index of the max per custom group:
# get the indices of the max value per group
idx = (df[df['State']].groupby(['room_id', (~df['State']).cumsum()])
['temperature'].idxmax()
)
# assign the new value
df.loc[idx, 'max_temp'] = 'max'
If you want the temperature value instead of a literax 'max'
:
df.loc[idx, 'max'] = df.loc[idx, 'temperature']
Output:
room_id temperature State max max_temp
0 117 1.489000 True max 1.489000
1 117 8.900000 False NaN NaN
2 117 2.500000 False NaN NaN
3 117 4.370000 False NaN NaN
4 117 4.363333 True max 4.363333
5 117 4.356667 True NaN NaN
6 118 4.350000 True NaN NaN
7 118 6.648000 True max 6.648000
8 118 6.642667 True NaN NaN
9 118 7.300000 False NaN NaN
10 118 9.400000 False NaN NaN
11 118 5.300000 True NaN NaN
12 118 7.100000 True max 7.100000
CodePudding user response:
You need compare maximum values per groups from cumulative sum and column room_id
and set values to new column by Series.where
:
print (df)
room_id temperature State
0 117 1.489000 True
1 117 8.900000 False
2 117 2.500000 False
3 117 4.370000 False
4 117 4.363333 True
5 117 4.356667 True
6 118 4.350000 True
7 118 6.648000 True
8 118 6.642667 True
9 118 7.300000 False
10 118 9.400000 False
11 118 5.300000 True
12 118 7.100000 True
masque = ~df['State']
s = df.groupby([masque.cumsum()[~masque],'room_id'])['temperature'].transform('max')
df['Max'] = df['temperature'].where(df['temperature'].eq(s))
print (df)
room_id temperature State Max
0 117 1.489000 True 1.489000
1 117 8.900000 False NaN
2 117 2.500000 False NaN
3 117 4.370000 False NaN
4 117 4.363333 True 4.363333
5 117 4.356667 True NaN
6 118 4.350000 True NaN
7 118 6.648000 True 6.648000
8 118 6.642667 True NaN
9 118 7.300000 False NaN
10 118 9.400000 False NaN
11 118 5.300000 True NaN
12 118 7.100000 True 7.100000