Replace values in one df from another on an ID col based on condition-CodePudding

Input:
d1 = {'col1': ['a','b','c','d'], 'col2': [1,2,3,4]}
d2 = {'col1': ['a','b','c'], 'col2': [0,3,4]}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)

Desired Output:
d3 = {'col1': ['a','b','c','d'], 'col2': [1,3,4,4]}
df1 = pd.DataFrame(d3)

So I'd like to replace the values in df1 using 'col1' as the primary key with values from df2 only if the value is greater. The 2 dataframes are not the same size.

CodePudding user response：

here is an idea using clip as a lower bound of the map of the values from df2 to df1['col1']

df1['col3'] = df1['col2'].clip(lower=df1['col1'].map(df2.set_index('col1')['col2']))
print(df1)
#   col1  col2  col3
# 0    a     1     1
# 1    b     2     3
# 2    c     3     4
# 3    d     4     4

CodePudding user response：

You can concat and get the max per row:

df1['col2'] = pd.concat([df1['col2'], df2['col2']], axis=1).max(1)

Output:

  col1  col2
0    a   1.0
1    b   3.0
2    c   4.0
3    d   4.0

NB. This works with an arbitrary number of dataframes as input

CodePudding user response：

Using the update function you can change a dataframe's column with some column of another dataframe. The logic of its greater can be done with a merge with the appropriate merge type:

dm= df1.merge(df2,on="col1").query("col2_x<col2_y").set_index("col1")[["col2_y"]].rename(columns={"col2_y":"col2"})
df1.set_index("col1", inplace=True)
df1.update(dm)
df3= df1.reset_index()

CodePudding user response：

Let us do append then groupby

out = df1.append(df2).groupby('col1',as_index=False).max()
  col1  col2
0    a     1
1    b     3
2    c     4
3    d     4