I am attempting to populate values from one DataFrame to another DataFrame based on a common column present in both DataFrames.
The code I wrote for this operation is as follows:
for i in df1.zipcodes:
for j in df2.zipcodes.unique():
if i == j:
#print("this is i:",i, "this is j:",j)
df1['rent'] = df2['rent']
The Dataframes (df1) in question looks as such with shape (131942, 2):
Providing 1st ten rows of df1:
zipcodes districts
018906 01
018907 01
018910 01
018915 01
018916 01
018925 01
018926 01
018927 01
018928 01
018929 01
018930 01
Additionally, there are no duplicates for the Zipcodes column, but the district column has 28 unique values. No Nan values are present.
The other DataFrame(df2) looks as such with shape (77996, 4)
Providing 1st ten rows of df2
street zipcodes district rent
E ROAD 545669 15 3600
E ROAD 545669 15 6200
E ROAD 545669 15 5500
E ROAD 545669 15 3200
H DRIVE 459108 19 3050
H DRIVE 459108 19 2000
A VIEW 098619 03 4200
A VIEW 098619 03 4500
J ROAD 018947 10 19500
O DRIVE 100088 04 9600
Note: The Zipcodes in df2 can repeat.
Now, I want to populate a column in df1 called rent, if the zipcodes in df1 matches the zipcode of df2. If the zipcodes match but there are multiple entries with the same zipcode in df2 then I want to populate the average as the rent. If there is only one entry for the zipcode then I want to populate the rent corresponding to that zipcode.
Any help on the above will be greatly appreciated.
CodePudding user response:
Use a merge
with the groupby.mean
of df2
:
out = df1.merge(df2.groupby('zipcodes', as_index=False)['rent'].mean(),
on='zipcodes', how='left')
CodePudding user response:
You can divide that into 2 phases:
1st phase: Aggregate the df2 to calculate the average rent by zip code. If the zip code has only one rent then the average value will be equal to that exact rent value so it still matches what you need.
df2 = df2.groupby('zipcodes').mean()['rent'].reset_index()
2nd phase: Merge to df1 using zipcodes
df1 = df1.merge(df2, on='zipcodes', how='left')
You can change how parameter to left or inner depending on what you need. Left join will keep all the rows from df1 and fill NA if can't find any match from df2. Inner join will only keep rows that can be found in both df1 and df2.
Hope this help.