I have two panda dataframes.
Data frame one has three columns:
name | start_time | end_time |
---|---|---|
alice | 04:00 | 05:00 |
bob | 05:00 | 07:00 |
Data frame two has three columns:
time | points_1 | points_2 |
---|---|---|
04:30 | 5 | 4 |
04:45 | 8 | 6 |
05:30 | 10 | 3 |
06:15 | 4 | 7 |
06:55 | 1 | 0 |
I would like to merge the two dataframes such that the first dataframe now has 5 columns:
name | start_time | end_time | average_point_1 | average_point_2 |
---|---|---|---|---|
alice | 04:00 | 05:00 | 6.5 | 5 |
bob | 05:00 | 07:00 | 5 | 3.33 |
Where the average_point_1 columns consists of average of points_1 from dataframe two between the start and end time for each row. Similarly average_point_2. Could someone tell me how I can merge the two dataframes like this without having to develop an averaging function to make the columns first and then merging.
CodePudding user response:
Try:
#convert all time fields to datetime for merge_asof compatibility
df1["start_time"] = pd.to_datetime(df1["start_time"],format="%H:%M")
df1["end_time"] = pd.to_datetime(df1["end_time"],format="%H:%M")
df2["time"] = pd.to_datetime(df2["time"],format="%H:%M")
#merge both dataframes on time
merged = pd.merge_asof(df2, df1, left_on="time", right_on="start_time")
#groupy and get average for each name
output = merged.groupby(["name", "start_time", "end_time"],as_index=False).mean()
#convert time columns back to strings if needed
output["start_time"] = output["start_time"].dt.strftime("%H:%M")
output["end_time"] = output["end_time"].dt.strftime("%H:%M")
>>> output
name start_time end_time points_1 points_2
0 alice 04:00 05:00 6.5 5.000000
1 bob 05:00 07:00 5.0 3.333333