Home > Software design >  Pandas merge with average of second dataframe
Pandas merge with average of second dataframe

Time:04-07

I have two panda dataframes.

Data frame one has three columns:

name start_time end_time
alice 04:00 05:00
bob 05:00 07:00

Data frame two has three columns:

time points_1 points_2
04:30 5 4
04:45 8 6
05:30 10 3
06:15 4 7
06:55 1 0

I would like to merge the two dataframes such that the first dataframe now has 5 columns:

name start_time end_time average_point_1 average_point_2
alice 04:00 05:00 6.5 5
bob 05:00 07:00 5 3.33

Where the average_point_1 columns consists of average of points_1 from dataframe two between the start and end time for each row. Similarly average_point_2. Could someone tell me how I can merge the two dataframes like this without having to develop an averaging function to make the columns first and then merging.

CodePudding user response:

Try:

#convert all time fields to datetime for merge_asof compatibility
df1["start_time"] = pd.to_datetime(df1["start_time"],format="%H:%M")
df1["end_time"] = pd.to_datetime(df1["end_time"],format="%H:%M")
df2["time"] = pd.to_datetime(df2["time"],format="%H:%M")

#merge both dataframes on time
merged = pd.merge_asof(df2, df1, left_on="time", right_on="start_time")

#groupy and get average for each name
output = merged.groupby(["name", "start_time", "end_time"],as_index=False).mean()

#convert time columns back to strings if needed
output["start_time"] = output["start_time"].dt.strftime("%H:%M")
output["end_time"] = output["end_time"].dt.strftime("%H:%M")

>>> output
     name start_time end_time  points_1  points_2
0  alice      04:00    05:00       6.5  5.000000
1    bob      05:00    07:00       5.0  3.333333
  • Related