Home > Back-end >  How to remove a specific combination of letters from the end of every words in a dataframe column?
How to remove a specific combination of letters from the end of every words in a dataframe column?

Time:10-28

I want to remove the letters br from the end of every word in my Pandas dataframe column (As you'll see, the rows of this column are actually sentances - all different from one another).

Unfortunately, I'd already cleaned the data without giving much thought to the < br > tags, so I'm now left with words like 'startbr,' 'nicebr,' and 'hellobr,' which are of no use to me.

A dataframe row may look something like this (errors denoted by ** ** tags):

Sentence = here are **somebr** examples of poorly written paragraphs **andbr** well-written **paragraphsbr** on the same **topicbr** how do they compare?

What I'd like (without the br at the end):

Sentence: here are **some** examples of poorly written **and** well-written **paragraphs** on the same **topic** how do they compare?

I'm hoping for am answer that will allow me to keep the original sentance (without any words that are followed by the letter br at the end). Words like "brutish," "breathtaking," and "ember" should be kept as is, since they could be of value. Fortunately there aren't any words that I'd like to retain that end with the letters br.

CodePudding user response:

Use a regex with a word boundary (\b) to match the end of words:

df['text'] = df['text'].str.replace(r'br\b', '', regex=True)

Example (with assignment as a new column text2):

                        text                  text2
0  word wordbr bread breadbr  word word bread bread
  • Related