Home > Mobile >  Python: Replace string in one column from list in other column
Python: Replace string in one column from list in other column

Time:06-07

I need some help please.

I have a dataframe with multiple columns where 2 are:

Content_Clean = Column filled with Content - String

Removals: list of strings to be removed from Content_Clean Column

Problem: I am trying to replace words in Content_Clean with spaces if in Removals Column: Example Image

Example:

Content Clean: 'Johnny and Mary went to the store'

Removals: ['Johnny','Mary']

Output: 'and went to the store'

Example Code:

for i in data_eng['Removals']:
    for u in i:
        data_eng['Content_Clean_II'] = data_eng['Content_Clean'].str.replace(u,' ')

This does not work as Removals columns contain lists per row.

Another Example:

data_eng['Content_Clean_II'] = data_eng['Content_Clean'].apply(lambda x: re.sub(data_eng.loc[data_eng['Content_Clean'] == x, 'Removals'].values[0], '', x)) 

Does not work as this code is only looking for one string.

The problem is that Removals column is a list that I want use to remove/ replace with spaces in the Content_Clean column on a per row basis.

The example image link might help

CodePudding user response:

Here you go. This worked on my test data. Let me know if it works for you

def repl(row):
  for word in row['Removals']:
    row['Content_Clean'] = row['Content_Clean'].replace(word, '')
  
  return row

data_eng = data_eng.apply(repl, axis=1)

CodePudding user response:

You can call the str.replace(old, new) method to remove unwanted words from a string. Here is one small example I have done.

a_string = "I do not like to eat apples and watermelons"

stripped_string = a_string.replace(" do not", "")

print(stripped_string)

This will remove "do not" from the sentence

  • Related