I am stuck on a project. I am trying to create a new unique column by checking two columns (A & B), if the values in A exist anywhere in B, or the value of B exist anywhere in A return that value, else return an "". For example, I have;
colA colB
0 x
1 y
2 c
3 d
4 x
5 d
6
After comparing colA and colB for the first time, I am expecting something like this;
colA colB new_colA
0 x x
1 y y
2 c
3 d d
4 x x
5 d d
6
And the for the second time;
colA colB new_colA new_colB
0 x x
1 y y
2 c c
3 d d
4 x x
5 d d
6
I don't know how to go about it using python. I tried excel where I just used conditional formatting to highlight duplicates.
CodePudding user response:
If you have NaNs in empty cells, you can use:
m = df['colB'].isin(df['colA'])
df['new_colA'] = df['colB'].where(m).fillna(df['colA'])
df['new_colB'] = df['colB'].mask(m)
Output:
colA colB new_colA new_colB
0 x NaN x NaN
1 y NaN y NaN
2 NaN c NaN c
3 NaN d d NaN
4 NaN x x NaN
5 d NaN d NaN
6 NaN NaN NaN NaN
Variant for empty strings:
m = df['colB'].isin(df['colA'])&df['colB'].ne('')
df['new_colA'] = df['colB'].where(m).fillna(df['colA'])
df['new_colB'] = df['colB'].mask(m).fillna('')
Output:
colA colB new_colA new_colB
0 x x
1 y y
2 c c
3 d d
4 x x
5 d d
6