Home > Net >  Python Pandas compare columns in a dataframe horizontally
Python Pandas compare columns in a dataframe horizontally

Time:02-02

I have a table extracted from SQL which looks like below. The background is that there are 2 systems that track the transactions. And sometimes, there are discrepancies between the items being captured by the 2 systems.

I am hoping to write a function/loop in python Pandas (or SQL directly) that can compare Items 1,2 and 3 from System A with Items 1,2,3 from System B, and return any rows that have NO common items.

enter image description here

Here is the expected output.

enter image description here

To explain the output. In Transaction ID 1, because coffee appear in both systems, no need to return. In Transaction ID 2, soup is no where in System A and this row should be returned. In Transaction ID 5, System B failed to capture anything and this row should be returned.

Thanks in advance.

CodePudding user response:

You can make sets from the entries and then check if the sets don't share a single item:

a = df[['sys a item 1', 'sys a item 2', ...]].apply(set, axis=1).copy()  # <- columns of system a
b = df[['sys b item 1', 'sys b item 2', ...]].apply(set, axis=1).copy()  # <- columns of system b
# create mask where sys-a and sys-b don't share a single item
mask = [set(x).isdisjoint(y) for x, y in zip(a, b)]
# filter dataframe by using the mask
df[mask]

CodePudding user response:

You can split the dataframe, compare each split and use the index of the comparison to get the correspondent rows from the original dataframe.

df1 = df[['SYS A Item 1', 'SYS A Item 3', 'SYS A Item 3']]
df2 = df[['SYS B Item 1', 'SYS B Item 3', 'SYS B Item 3']]

idx = df1.compare(df2).index

df.iloc[idx]
  • Related