Home > Software engineering >  How to separate characters of a column based on its intersection with another column?
How to separate characters of a column based on its intersection with another column?

Time:02-28

There are two columns in my df, the second column includes data of the other column other characters (alphabets and/or numbers):

values = {
    'number': [2830, 8457, 9234],
    'nums': ['2830S', '8457M', '923442']
}
df = pd.DataFrame(values, columns=['number', 'nums'])

The extra characters are always after the common characters! How can I separate the characters that are not common between the two columns? I am looking for a simple solution, not a loop to check every character.

CodePudding user response:

Replace common characters by empty string:

f_diff = lambda x: x['nums'].replace(x['number'], '')
df['extra'] = df[['number', 'nums']].astype(str).apply(f_diff, axis=1)
print(df)

# Output
   number    nums extra
0    2830   2830S     S
1    8457   8457M     M
2    9234  923442    42

Update

If number values are always the first characters of nums column, you can use a simpler function:

f_diff2 = lambda x: x['nums'][len(x['number']):]
df['extra'] = df[['number', 'nums']].astype(str).apply(f_diff2, axis=1)
print(df)

# Output
# Output
   number    nums extra
0    2830   2830S     S
1    8457   8457M     M
2    9234  923442    42

CodePudding user response:

I would delete the prefix of the string. For this you can the method apply() to apply following function on each row:

def remove_prefix(text, prefix):
    if text.startswith(prefix):
            return text[len(prefix):]
    return text

df['nums'] = df.apply(lambda x: remove_prefix(x['nums'], str(x['number'])), axis=1)
df

Output:

    number  nums
0   2830    S
1   8457    M
2   9234    42

If you have python version >= 3.9 you only need this:

df['nums'] = df.apply(lambda x: x['nums'].removeprefix(x['number']), axis=1)
  • Related