I have got pandas DataFrame as below:
First Column | Second Column |
---|---|
Dog | Dog is good |
Big Cat | Big cat is here |
Fat rat | Fat rat is there |
Pink tree | Pink tree means love |
I want to remove repeating word in second column based on first column. My desired output is:
First Column | Second Column |
---|---|
Dog | is good |
Big Cat | is here |
Fat rat | is there |
Pink tree | means love |
How can i achieve it?
I have looked around here, but could not find solution which would suite me.
Thanks!
CodePudding user response:
Try using row-wise apply
with axis=1
:
df['Second Column'] = df.apply(lambda x: x['Second Column'].lower().replace(x['First Column'].lower(), ''), axis=1)
>>> df
First Column Second Column
0 Dog is good
1 Big Cat is here
2 Fat rat is there
3 Pink tree means love
>>>
CodePudding user response:
Instead of spoon-feeding you a solution (there are several), i'd let you know how I'd go about solving this in the simplest way I can think of. Provided the same pattern repeats throughout the entire dataset, and there are no anomalies (like stray whitespaces); a solution, IMO, could be to extract a substring (ranged-slice) from "Second Column" with an offset equal to the length of the element in "First Column" 1 (to account for the whitespace in "Second Column"), to the end.
**A caveat: This might not be the most "Pandas"-esque solution out there.