Home > Blockchain >  Python Pandas Merge Two Dataframes Based on Another Correspondence Dataframe
Python Pandas Merge Two Dataframes Based on Another Correspondence Dataframe

Time:11-02

I'm working with 2 dataframes. Dataframe1 is for parking sites. Dataframe2 is for sensors. Correspondence dataframe shows which sensor is in which site.

Dataframe1:

     Site   Time   Available   Capacity
0    19E    12:00  5           10
1    19E    13:00  4           10 
2    44E    12:00  8           22 
3    44E    13:00  11          22 

Dataframe2:

     Sensor   Time   Temp   Precipitation
0    113      12:00  74     0.01
1    113      13:00  76     0.02 
2    114      12:00  75     0.00 
3    114      13:00  77     0.00 

Correspondence dataframe:

     Site   Sensor  
0    19E    113
1    44E    114
2    58E    115
... 

I'd like to combine dataframe 1 and 2 based on the correspondence dataframe, and also ‘Time’ column. Intervals are both 1h in those two dataframes.

Expected result:

     Site   Time   Available   Capacity   Sensor   Time   Temp   Precipitation
0    19E    12:00  5           10         113      12:00  74     0.01
1    19E    13:00  4           10         113      13:00  76     0.02
2    44E    12:00  8           22         114      12:00  75     0.00
3    44E    13:00  11          22         114      13:00  77     0.00 

You can use the code below to generate raw materials:

import pandas as pd

df1 = pd.DataFrame({
    'Site': {0: '19E', 1: '19E', 2: '44E', 3: '44E'},
    'Time': {0: '12:00', 1: '13:00', 2: '12:00', 3: '13:00'},
    'Available': {0: 5, 1: 4, 2: 8, 3: 11},
    'Capacity': {0: 10, 1: 10, 2: 22, 3: 22}})

df2 = pd.DataFrame({
    'Sensor': {0: 113, 1: 113, 2: 114, 3: 114},
    'Time': {0: '12:00', 1: '13:00', 2: '12:00', 3: '13:00'},
    'Tem': {0: 74, 1: 76, 2: 75, 3: 77},
    'Precipitation': {0: 0.01, 1: 0.02, 2: 0.00, 3: 0.00}})

cor_df = pd.DataFrame({
    'Site': {0: '19E', 1: '44E', 2: '58E'},
    'Sensor': {0: 113, 1: 114, 2: 115}})

CodePudding user response:

Use Series.map to map Site to Sensor and then DataFrame.merge on Sensor and Time:

lookup = cor_df.set_index("Site").squeeze()
res = df1.assign(Sensor=df1["Site"].map(lookup)).merge(df2, on=["Sensor", "Time"])
print(res)

Output

  Site   Time  Available  Capacity  Sensor  Tem  Precipitation
0  19E  12:00          5        10     113   74           0.01
1  19E  13:00          4        10     113   76           0.02
2  44E  12:00          8        22     114   75           0.00
3  44E  13:00         11        22     114   77           0.00
  • Related