Home > Mobile >  How to identify the Records (Order ID Product ID combination) present in data1 but missing in data
How to identify the Records (Order ID Product ID combination) present in data1 but missing in data

Time:06-07

Data1 =

enter image description here

Data2 =

enter image description here

The primary key for both data1 and data2 is Order Id Product ID combination (i.e. the individual datasets do not have any duplicate on this combination)

Q1) How to identify the Records (Order ID Product ID combination) present in data1 but missing in data2

Q2) How to identify the Records (Order ID Product ID combination) missing in data1 but present in data2

df2["Record"] = df2["Order ID"] df2["Product ID"]
df1["Record"] = df1["Order ID"] df1["Product ID"]

df1.set_index("Record").subtract(df2.set_index("Record"), fill_value=0)

It gives me this error while trying to solve question 1

TypeError: unsupported operand type(s) for -: 'str' and 'int'

CodePudding user response:

IIUC, you can use isin or python set

Q1) How to identify the Records (Order ID Product ID combination) present in data1 but missing in data2

m1 = ~df1["Record"].isin(df2["Record"])
res = df1.loc[m1, "Record"]
# or
res = set(df1["Record"]).difference(set(df2["Record"]))

Q2) How to identify the Records (Order ID Product ID combination) missing in data1 but present in data2

m2 = ~df2["Record"].isin(df1["Record"])
res = df2.loc[m2, "Record"]
# or
res = set(df2["Record"]).difference(set(df1["Record"]))

CodePudding user response:

for Q1 you can do it like

[i for i in list(df1["Record"]) if i not in list(df2["Record"])]

and for Q2

[i for i in list(df2["Record"]) if i not in list(df1["Record"])]

CodePudding user response:

You could also try df.merge()

df_result = df1.merge(df2.drop_duplicates(), on=['Order ID','Product ID'], how='left', indicator=True)

This will provide you with an indicator which tells you from where the record is from.

Then you can simply use df_all['_merge'] == 'left_only'.

  • Related