Home > Mobile >  Remove repeating words in column, based on another column
Remove repeating words in column, based on another column

Time:09-26

I have got pandas DataFrame as below:

First Column Second Column
Dog Dog is good
Big Cat Big cat is here
Fat rat Fat rat is there
Pink tree Pink tree means love

I want to remove repeating word in second column based on first column. My desired output is:

First Column Second Column
Dog is good
Big Cat is here
Fat rat is there
Pink tree means love

How can i achieve it?

I have looked around here, but could not find solution which would suite me.

Thanks!

CodePudding user response:

Try using row-wise apply with axis=1:

df['Second Column'] = df.apply(lambda x: x['Second Column'].lower().replace(x['First Column'].lower(), ''), axis=1)

>>> df
  First Column Second Column
0          Dog       is good
1      Big Cat       is here
2      Fat rat      is there
3    Pink tree    means love
>>> 

CodePudding user response:

Instead of spoon-feeding you a solution (there are several), i'd let you know how I'd go about solving this in the simplest way I can think of. Provided the same pattern repeats throughout the entire dataset, and there are no anomalies (like stray whitespaces); a solution, IMO, could be to extract a substring (ranged-slice) from "Second Column" with an offset equal to the length of the element in "First Column" 1 (to account for the whitespace in "Second Column"), to the end.

**A caveat: This might not be the most "Pandas"-esque solution out there.

  • Related