How to find not repeating values in different columns python pandas?-CodePudding

I have two columns that I need to compare to third and to get data that is not same in first two columns.

import pandas as pd
from numpy import nan

df = pd.DataFrame ({'Want_value': ['a', 'c', 'c', 'c', 'v', 'b', nan, nan, nan, nan, nan, nan, nan], 'Unwanted_value': ['r', 't', 't', 'z', 't', nan, nan, nan, nan, nan, nan, nan, nan], 'new_all_data': ['r', 'z', 'a', 'c', 't', 'v', 'b', 'j', 'r', 't', 'v', 'a', 'k'], 'new_values': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]})

Looks like this...

Basically needs to be like this, 'Want_value' 'Unwanted_value' and compare to column with name 'new_all_data', at the end I need values that are in 'new_all_data' but there are not in 'Want_value' and 'Unwanted_value'. I hope this is clear.

The values in this minimal reproducible example that are in 'new_all_data' and not in 'Want_value' and 'Unwanted_value' are j and k. These values needs to be in a new column that I put here as empty 'new_values'.

Thanks in advance!

CodePudding user response：

You can use set operations:

current_values = set(df.Want_value) | set(df.Unwanted_value): union of the unique values of Want_value and Unwanted_value columns.
new_values = set(df.new_all_data) - current_values: symmetric difference, i.e. get the values that are new_all_data but are not in current values.

Then you can use DataFrame.where to create the new_values column, by getting only the values of new_all_data column which are in new_values set.

new_values = set(df.new_all_data) - (set(df.Want_value) | set(df.Unwanted_value))

df['new_values'] = df['new_all_data'].where(df['new_all_data'].isin(new_values))

Output:

>>> new_values

{'j', 'k'}

>>> df

   Want_value Unwanted_value new_all_data new_values
0           a              r            r        NaN
1           c              t            z        NaN
2           c              t            a        NaN
3           c              z            c        NaN
4           v              t            t        NaN
5           b            NaN            v        NaN
6         NaN            NaN            b        NaN
7         NaN            NaN            j          j
8         NaN            NaN            r        NaN
9         NaN            NaN            t        NaN
10        NaN            NaN            v        NaN
11        NaN            NaN            a        NaN
12        NaN            NaN            k          k