I am getting NA values when tried to merge two dataframes by common columns; I think it is straightforward to merge two dataframe by using pandas.merge
and I should expect correct output; but I am not getting correct output. Here is what I did:
my current attempt:
import pandas as pd
df1=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/df1.csv", index_col=0)
df2=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/df2.csv", index_col=0)
merged_df = df1.merge(df2, how='left', left_on=['ST', 'County_Name'], right_on=['ST', 'County_Name'])
by doing so, I want to match County_Name
, ST
so merge them by left
join and having FIPS_Code
column as well; but now FIPS_Code
become NAN
, I am sure County_Name
from df1
can match in df2
; I did manually checked but I don't get it why FIPS_Code
become NAN
; I want it left
join because I want merged_dataframe
has same shape as df1
.
Can anyone point me out what went wrong with above join? Is that because of data or joining is not working here? Any thoughts?
CodePudding user response:
There is space in your df2 County_Name
column
df2['County_Name'] = df2['County_Name'].str.strip()
merged_df = df1.merge(df2, how='left', left_on=['ST', 'County_Name'], right_on=['ST', 'County_Name'])
print(merged_df)
ST County_Name Orientation state_x Year state_y FIPS_Code
0 IL Adams Defense Illinois 2015 Illinois 17001.0
1 IL Alexander Defense Illinois 2015 Illinois 17003.0
2 IL Bond Defense Illinois 2015 Illinois 17005.0
3 IL Boone Defense Illinois 2015 Illinois 17007.0
4 IL Brown Defense Illinois 2015 Illinois 17009.0
.. .. ... ... ... ... ... ...
97 IL Whiteside Defense Illinois 2015 Illinois 17195.0
98 IL Will Neutral Illinois 2015 Illinois 17197.0
99 IL Williamson Defense Illinois 2015 Illinois 17199.0
100 IL Winnebago Defense Illinois 2015 Illinois 17201.0
101 IL Woodford Defense Illinois 2015 Illinois 17203.0