Python: Replace string in one column from list in other column-CodePudding

I need some help please.

I have a dataframe with multiple columns where 2 are:

Content_Clean = Column filled with Content - String

Removals: list of strings to be removed from Content_Clean Column

Problem: I am trying to replace words in Content_Clean with spaces if in Removals Column: Example Image

Example:

Content Clean: 'Johnny and Mary went to the store'

Removals: ['Johnny','Mary']

Output: 'and went to the store'

Example Code:

for i in data_eng['Removals']:
    for u in i:
        data_eng['Content_Clean_II'] = data_eng['Content_Clean'].str.replace(u,' ')

This does not work as Removals columns contain lists per row.

Another Example:

data_eng['Content_Clean_II'] = data_eng['Content_Clean'].apply(lambda x: re.sub(data_eng.loc[data_eng['Content_Clean'] == x, 'Removals'].values[0], '', x))

Does not work as this code is only looking for one string.

The problem is that Removals column is a list that I want use to remove/ replace with spaces in the Content_Clean column on a per row basis.

The example image link might help

CodePudding user response：

Here you go. This worked on my test data. Let me know if it works for you

def repl(row):
  for word in row['Removals']:
    row['Content_Clean'] = row['Content_Clean'].replace(word, '')
  
  return row

data_eng = data_eng.apply(repl, axis=1)

CodePudding user response：

You can call the str.replace(old, new) method to remove unwanted words from a string. Here is one small example I have done.

a_string = "I do not like to eat apples and watermelons"

stripped_string = a_string.replace(" do not", "")

print(stripped_string)

This will remove "do not" from the sentence