People arrive and depart a room at various times. Given their entry and exit times and their heights, what's an efficient way (in Python / Pandas) to find the maximum height of people in the room at all times?
Input:
people = pd.DataFrame({
"height": [175, 180, 160],
"arrive": pd.to_datetime(["2017-04-05 14:20", "2017-04-05 13:10", "2017-04-05 10:30"]),
"depart": pd.to_datetime(["2017-04-05 17:54", "2017-04-06 07:56", "2017-04-05 11:04"])
})
Output:
[["2017-04-05 10:30", 160], ["2017-04-05 11:04", None], ...]
Explanation: The first person arrives at 2017-04-05 10:30 and his height was 160, so max height in the room was 160 since that time. This person left the room at 11:04 so max height was None since that time.
This problem is similar to stackoverflow.com/q/47054341 but the solutions are not satisfactory because we should not round off timestamps to the nearest hour.
CodePudding user response:
I'm not sure this is an efficient way, but it may achieve what you want.
import panda as pd
people = pd.DataFrame({
"height": [175, 180, 160],
"arrive": pd.to_datetime(["2017-04-05 14:20", "2017-04-05 13:10", "2017-04-05 10:30"]),
"depart": pd.to_datetime(["2017-04-05 17:54", "2017-04-06 07:56", "2017-04-05 11:04"])
})
result = []
for time in sorted(list(people["arrive"]) list(people["depart"])):
cond = (people["arrive"] <= time) & (people["depart"] > time)
result.append([str(time), people[cond]["height"].max()])
print(result)
"""
[['2017-04-05 10:30:00', 160],
['2017-04-05 11:04:00', nan],
['2017-04-05 13:10:00', 180],
['2017-04-05 14:20:00', 180],
['2017-04-05 17:54:00', 180],
['2017-04-06 07:56:00', nan]]
"""
CodePudding user response:
IIUC
# melt to move columns as rows
df2=df.melt(id_vars='height', var_name='status', value_name='date')
# set height as None for departed time
df2['height']= df2['height'].mask(df2['status'].eq('depart'), 'None')
# groupby date and takes the max for each date
df2=df2.groupby('date', as_index=False)['height'].max()
df2
date height
0 2017-04-05 10:30:00 160
1 2017-04-05 11:04:00 None
2 2017-04-05 13:10:00 180
3 2017-04-05 14:20:00 175
4 2017-04-05 17:54:00 None
5 2017-04-06 07:56:00 None