Unlike the other questions, I don't want to create a new column with the new values, I want to use the same column just changing the old values for new ones if they exist.
For a new column I would have:
import pandas as pd
df1 = pd.DataFrame(data = {'Name' : ['Carl','Steave','Julius','Marcus'],
'Work' : ['Home','Street','Car','Airplane'],
'Year' : ['2022','2021','2020','2019'],
'Days' : ['',5,'','']})
df2 = pd.DataFrame(data = {'Name' : ['Carl','Julius'],
'Work' : ['Home','Car'],
'Days' : [1,2]})
df_merge = pd.merge(df1, df2, how='left', on=['Name','Work'], suffixes=('','_'))
print(df_merge)
Name Work Year Days Days_
0 Carl Home 2022 1.0
1 Steave Street 2021 5 NaN
2 Julius Car 2020 2.0
3 Marcus Airplane 2019 NaN
But what I really want is exactly like this:
Name Work Year Days
0 Carl Home 2022 1
1 Steave Street 2021 5
2 Julius Car 2020 2
3 Marcus Airplane 2019
How can I make such a union?
CodePudding user response:
You can use combine_first
, setting the empty strings to NaNs beforehand (the indexing at the end is to rearrange the columns to match the desired output):
df1.loc[df1["Days"] == "", "Days"] = float("NaN")
df1.combine_first(df1[["Name", "Work"]].merge(df2, "left"))[df1.columns.values]
This outputs:
Name Work Year Days
0 Carl Home 2022 1.0
1 Steave Street 2021 5
2 Julius Car 2020 2.0
3 Marcus Airplane 2019 NaN
CodePudding user response:
You can use the update
method of Series
:
df1.Days.update(pd.merge(df1, df2, how='left', on=['Name','Work']).Days_y)