Compare strings in two pandas columns and write remainder in new column-CodePudding

I have a dataframe

Name    SubName
AB      ABCD
UI      10UI09
JK      89-JK-07
yhk     100yhk0A

I need a column added mentioning the characters in SubName which are not in Name.

Name    SubName    Remainder
AB      ABCD       CD
UI      10UI09     1009
JK      89-JK-07   89--07
yhk     100yhk0A   1000A

CodePudding user response：

You need to use a loop here, you can use a regex:

import re
df['Remainder'] = [re.sub(f'[{"".join(set(a))}]', '', b)
                   for a,b in zip(df['Name'], df['SubName'])]

Alternative with join and set (could be faster in some cases):

df['Remainder'] = [''.join([c for c in b if c not in S])
                   if (S:=set(a)) else b
                   for a,b in zip(df['Name'], df['SubName'])
                  ]

output:

  Name   SubName Remainder
0   AB      ABCD        CD
1   UI    10UI09      1009
2   JK  89-JK-07    89--07
3  yhk  100yhk0A     1000A

CodePudding user response：

You can also use apply to get the new columns, like this:

df["Remainder"] = df.apply(lambda x: (x["SubName"].replace(x["name"], "")), axis=1)

Output:

name    SubName    Remainder
AB       ABCD        CD
UI      10UI09      1009
JK     89-JK-07    89--07
yhk    100yhk0A     1000A