I have two dataFrames that after merge by "Name" some rows retun NaN because the "Names" are incomplete.
df1
Name | Info 1 |
---|---|
Walter | Adress 1 |
john wick | Adress 1 |
df2
Name | Info 2 |
---|---|
Walter White | Male |
john wick | Male |
df2 = pd.merge(df1,df2,on='Name', how='left')
I'm geting
Name | Info 1 | Info 2 |
---|---|---|
Walter | NaN | NaN |
john wick | Adress 1 | Male |
I Want
Name | Info 1 | Info 2 |
---|---|---|
Walter White | Adress 1 | Male |
john wick | Adress 1 | Male |
How can I treat rows, to try get values by substring, if return NaN? I dont know if use merge in first time was the best logic.
CodePudding user response:
Try this:
df2 = pd.merge_asof(df1,df2,on='Name', how='left')
this depends on the resemblance of the different values
CodePudding user response:
The reason its not working is because pandas doesn't consider "Walter" and "Walter White" as same values. Thus when you perform a left join on df1 it keeps all the values of df1 and adds the values from df2 that have the same "Name" column values. Since walter is not present in df2 it adds NaN in info2 column(again "walter" and "walter white" are different). One way you could solve this is by creating two separate columns for "First_Name" and "Last_Name" and then try merging on "First_Name" something like
df1["First_Name"] = df1.apply(lambda row: row['Name'].split()[0], axis = 1)
df2["First_Name"] = df2.apply(lambda row: row['Name'].split()[0], axis = 1)
Then simply use the same merge as you did...
df2 = pd.merge(df1,df2,on='First_Name', how='left')