I need some help please.
I have a dataframe with multiple columns where 2 are:
Content_Clean = Column filled with Content - String
Removals: list of strings to be removed from Content_Clean Column
Problem: I am trying to replace words in Content_Clean with spaces if in Removals Column: Example Image
Example:
Content Clean: 'Johnny and Mary went to the store'
Removals: ['Johnny','Mary']
Output: 'and went to the store'
Example Code:
for i in data_eng['Removals']:
for u in i:
data_eng['Content_Clean_II'] = data_eng['Content_Clean'].str.replace(u,' ')
This does not work as Removals columns contain lists per row.
Another Example:
data_eng['Content_Clean_II'] = data_eng['Content_Clean'].apply(lambda x: re.sub(data_eng.loc[data_eng['Content_Clean'] == x, 'Removals'].values[0], '', x))
Does not work as this code is only looking for one string.
The problem is that Removals column is a list that I want use to remove/ replace with spaces in the Content_Clean column on a per row basis.
The example image link might help
CodePudding user response:
Here you go. This worked on my test data. Let me know if it works for you
def repl(row):
for word in row['Removals']:
row['Content_Clean'] = row['Content_Clean'].replace(word, '')
return row
data_eng = data_eng.apply(repl, axis=1)
CodePudding user response:
You can call the str.replace(old, new) method to remove unwanted words from a string. Here is one small example I have done.
a_string = "I do not like to eat apples and watermelons"
stripped_string = a_string.replace(" do not", "")
print(stripped_string)
This will remove "do not" from the sentence