Home > Enterprise >  Find the highest value locations within an interval and for a specific index?
Find the highest value locations within an interval and for a specific index?

Time:11-30

Given this pandas dataframe with three columns, 'room_id', 'temperature' and 'State'. How do I get a forth column 'Max' indicating wehn the value is a maximum for each interval where State is True and for each room ?

117     1.489000  True
        8.9       False
        2.5       False  
        4.370000  False  
        4.363333  True
        4.356667  True
118     4.35      True
        6.648000  True
        6.642667  True
        7.3       False
        9.4       False
        5.3       True
        7.1       True

What I am expecting

117     1.489000  True   max
        8.9       False
        2.5       False  
        4.370000  False  
        4.363333  True   max
        4.356667  True
118     4.35      True
        6.648000  True   max
        6.642667  True
        7.3       False
        9.4       False
        5.3       True
        7.1       True   max 

I used this : 

Max = df_state.groupby(masque.cumsum()[~masque])['temperature'].agg(['idxmax'])

But I found this : 

117 1.489000 True max 8.9 False 2.5 False
4.370000 False
4.363333 True
4.356667 True 118 4.35 True 6.648000 True max 6.642667 True 7.3 False 9.4 False 5.3 True 7.1 True max


I miss the last max of room 117 because the algorithm does not take into account the room id

CodePudding user response:

You can use groupby.idxmax to get the index of the max per custom group:

# get the indices of the max value per group
idx = (df[df['State']].groupby(['room_id', (~df['State']).cumsum()])
       ['temperature'].idxmax()
       )

# assign the new value
df.loc[idx, 'max_temp'] = 'max'

If you want the temperature value instead of a literax 'max':

df.loc[idx, 'max'] = df.loc[idx, 'temperature']

Output:

    room_id  temperature  State  max  max_temp
0       117     1.489000   True  max  1.489000
1       117     8.900000  False  NaN       NaN
2       117     2.500000  False  NaN       NaN
3       117     4.370000  False  NaN       NaN
4       117     4.363333   True  max  4.363333
5       117     4.356667   True  NaN       NaN
6       118     4.350000   True  NaN       NaN
7       118     6.648000   True  max  6.648000
8       118     6.642667   True  NaN       NaN
9       118     7.300000  False  NaN       NaN
10      118     9.400000  False  NaN       NaN
11      118     5.300000   True  NaN       NaN
12      118     7.100000   True  max  7.100000

CodePudding user response:

You need compare maximum values per groups from cumulative sum and column room_id and set values to new column by Series.where:

print (df)
    room_id  temperature  State
0       117     1.489000   True
1       117     8.900000  False
2       117     2.500000  False
3       117     4.370000  False
4       117     4.363333   True
5       117     4.356667   True
6       118     4.350000   True
7       118     6.648000   True
8       118     6.642667   True
9       118     7.300000  False
10      118     9.400000  False
11      118     5.300000   True
12      118     7.100000   True

masque = ~df['State']
s  = df.groupby([masque.cumsum()[~masque],'room_id'])['temperature'].transform('max')
df['Max'] = df['temperature'].where(df['temperature'].eq(s))

print (df)
    room_id  temperature  State       Max
0       117     1.489000   True  1.489000
1       117     8.900000  False       NaN
2       117     2.500000  False       NaN
3       117     4.370000  False       NaN
4       117     4.363333   True  4.363333
5       117     4.356667   True       NaN
6       118     4.350000   True       NaN
7       118     6.648000   True  6.648000
8       118     6.642667   True       NaN
9       118     7.300000  False       NaN
10      118     9.400000  False       NaN
11      118     5.300000   True       NaN
12      118     7.100000   True  7.100000
  • Related