I have two dataframes, one coming from a database with the following fields:
name | id |
---|---|
bakarery | 010203040000150 |
store | 010203040000160 |
market | 010203040000180 |
hospital | 010203040000190 |
bakery | 010203040000200 |
And another dataframe that I need to compare to be able to update the IDs:
name | id |
---|---|
bakarery | 1020304050 |
store | 010203040000160 |
market | 010203040000180 |
hospital | 3040506070 |
bakery | 010203040000200 |
I need to create a third dataframe only with the IDs I need to update, looking at the name, if that name updated the ID then I create that dataframe.
How can I do this?
Expected output:
name | id |
---|---|
bakarery | 1020304050 |
hospital | 3040506070 |
CodePudding user response:
assuming first one is df1
and second one df2
:
df2.join(df1, on="name").where(df1["id"] != df2["id"]).show()
-------- ---------- ---------------
| name| id| id|
-------- ---------- ---------------
|bakarery|1020304050|010203040000150|
|hospital|3040506070|010203040000190|
-------- ---------- ---------------
or also :
df2.subtract(df1).show()
-------- ----------
| name| id|
-------- ----------
|bakarery|1020304050|
|hospital|3040506070|
-------- ----------
CodePudding user response:
d = {'bakery':'010203040000150','store':'010203040000160'}
import pandas as pd
df1=pd.DataFrame(data=d,index=[0])
d1={'bakery':'1020304050','store':'010203040000160'}
df2=pd.DataFrame(data=d1,index=[0])
df3=df1==df2
df4=df2.mask(~df3).fillna(df2)
df4
bakery store
0 1020304050 010203040000160
The above code is executed for a small sample, but it should do the job.