Input:
d1 = {'col1': ['a','b','c','d'], 'col2': [1,2,3,4]}
d2 = {'col1': ['a','b','c'], 'col2': [0,3,4]}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
Desired Output:
d3 = {'col1': ['a','b','c','d'], 'col2': [1,3,4,4]}
df1 = pd.DataFrame(d3)
So I'd like to replace the values in df1 using 'col1' as the primary key with values from df2 only if the value is greater. The 2 dataframes are not the same size.
CodePudding user response:
here is an idea using clip
as a lower bound of the map
of the values from df2
to df1['col1']
df1['col3'] = df1['col2'].clip(lower=df1['col1'].map(df2.set_index('col1')['col2']))
print(df1)
# col1 col2 col3
# 0 a 1 1
# 1 b 2 3
# 2 c 3 4
# 3 d 4 4
CodePudding user response:
You can concat
and get the max
per row:
df1['col2'] = pd.concat([df1['col2'], df2['col2']], axis=1).max(1)
Output:
col1 col2
0 a 1.0
1 b 3.0
2 c 4.0
3 d 4.0
NB. This works with an arbitrary number of dataframes as input
CodePudding user response:
Using the update function you can change a dataframe's column with some column of another dataframe. The logic of its greater can be done with a merge with the appropriate merge type:
dm= df1.merge(df2,on="col1").query("col2_x<col2_y").set_index("col1")[["col2_y"]].rename(columns={"col2_y":"col2"})
df1.set_index("col1", inplace=True)
df1.update(dm)
df3= df1.reset_index()
CodePudding user response:
Let us do append then groupby
out = df1.append(df2).groupby('col1',as_index=False).max()
col1 col2
0 a 1
1 b 3
2 c 4
3 d 4