Home > Blockchain >  Maximum height of people in a room given entry and exit times
Maximum height of people in a room given entry and exit times

Time:10-25

People arrive and depart a room at various times. Given their entry and exit times and their heights, what's an efficient way (in Python / Pandas) to find the maximum height of people in the room at all times?

Input:

people = pd.DataFrame({
    "height": [175, 180, 160],
    "arrive": pd.to_datetime(["2017-04-05 14:20", "2017-04-05 13:10", "2017-04-05 10:30"]),
    "depart": pd.to_datetime(["2017-04-05 17:54", "2017-04-06 07:56", "2017-04-05 11:04"])
})

Output:

[["2017-04-05 10:30", 160], ["2017-04-05 11:04", None], ...]

Explanation: The first person arrives at 2017-04-05 10:30 and his height was 160, so max height in the room was 160 since that time. This person left the room at 11:04 so max height was None since that time.

This problem is similar to stackoverflow.com/q/47054341 but the solutions are not satisfactory because we should not round off timestamps to the nearest hour.

CodePudding user response:

I'm not sure this is an efficient way, but it may achieve what you want.

import panda as pd

people = pd.DataFrame({
    "height": [175, 180, 160],
    "arrive": pd.to_datetime(["2017-04-05 14:20", "2017-04-05 13:10", "2017-04-05 10:30"]),
    "depart": pd.to_datetime(["2017-04-05 17:54", "2017-04-06 07:56", "2017-04-05 11:04"])
})

result = []
for time in sorted(list(people["arrive"])   list(people["depart"])):
  cond = (people["arrive"] <= time) & (people["depart"] > time)
  result.append([str(time), people[cond]["height"].max()])

print(result)

"""
[['2017-04-05 10:30:00', 160],
 ['2017-04-05 11:04:00', nan],
 ['2017-04-05 13:10:00', 180],
 ['2017-04-05 14:20:00', 180],
 ['2017-04-05 17:54:00', 180],
 ['2017-04-06 07:56:00', nan]]
"""

CodePudding user response:

IIUC

# melt to move columns as rows
df2=df.melt(id_vars='height', var_name='status', value_name='date')

# set height as None for departed time
df2['height']=  df2['height'].mask(df2['status'].eq('depart'), 'None')

# groupby date and takes the max for each date
df2=df2.groupby('date', as_index=False)['height'].max() 
df2

date    height
0   2017-04-05 10:30:00     160
1   2017-04-05 11:04:00     None
2   2017-04-05 13:10:00     180
3   2017-04-05 14:20:00     175
4   2017-04-05 17:54:00     None
5   2017-04-06 07:56:00     None
  • Related