Look up values from one df to another df based on a specific column-CodePudding

I am attempting to populate values from one DataFrame to another DataFrame based on a common column present in both DataFrames.

The code I wrote for this operation is as follows:

for i in df1.zipcodes:
    for j in df2.zipcodes.unique():
        if i == j:
        #print("this is i:",i, "this is j:",j)
        df1['rent'] = df2['rent']

The Dataframes (df1) in question looks as such with shape (131942, 2):

Providing 1st ten rows of df1:

zipcodes districts
018906      01
018907      01
018910      01
018915      01
018916      01
018925      01
018926      01
018927      01
018928      01
018929      01
018930      01

Additionally, there are no duplicates for the Zipcodes column, but the district column has 28 unique values. No Nan values are present.

The other DataFrame(df2) looks as such with shape (77996, 4)

 Providing 1st ten rows of df2
 street    zipcodes  district  rent
 E ROAD    545669    15        3600
 E ROAD    545669    15        6200
 E ROAD    545669    15        5500
 E ROAD    545669    15        3200 
 H DRIVE   459108    19        3050
 H DRIVE   459108    19        2000
 A VIEW    098619    03        4200
 A VIEW    098619    03        4500
 J ROAD    018947    10        19500
 O DRIVE   100088    04        9600
Note: The Zipcodes in df2 can repeat.

Now, I want to populate a column in df1 called rent, if the zipcodes in df1 matches the zipcode of df2. If the zipcodes match but there are multiple entries with the same zipcode in df2 then I want to populate the average as the rent. If there is only one entry for the zipcode then I want to populate the rent corresponding to that zipcode.

Any help on the above will be greatly appreciated.

CodePudding user response：

Use a merge with the groupby.mean of df2:

out = df1.merge(df2.groupby('zipcodes', as_index=False)['rent'].mean(),
                on='zipcodes', how='left')

CodePudding user response：

You can divide that into 2 phases:

1st phase: Aggregate the df2 to calculate the average rent by zip code. If the zip code has only one rent then the average value will be equal to that exact rent value so it still matches what you need.
```
 df2 = df2.groupby('zipcodes').mean()['rent'].reset_index()
```

2nd phase: Merge to df1 using zipcodes

 df1 = df1.merge(df2, on='zipcodes', how='left')

You can change how parameter to left or inner depending on what you need. Left join will keep all the rows from df1 and fill NA if can't find any match from df2. Inner join will only keep rows that can be found in both df1 and df2.

Hope this help.