I am looking to replace specific string in a dataframe column for each row. Below is my attempt. Can someone please help in getting the expected output as mentioned below?
df_temp = df_temp.append({'id' : 'id_01', 'value' : 'value11, value12, value13'}, ignore_index = True)
df_temp = df_temp.append({'id' : 'id_02', 'value' : 'value21, value22, value23'}, ignore_index = True)
df_temp = df_temp.append({'id' : 'id_02', 'value' : 'value31, value12, value33'}, ignore_index = True)
print(df_temp)
string_to_remove = pd.Series(['value12', 'value33'])
print(string_to_remove)
df_temp['value'] = df_temp['value'].apply(lambda x : '' if x in string_to_remove else x)
print(df_temp)```
**current output**
id value
0 id_01 value11, value12, value13
1 id_02 value21, value22, value23
2 id_02 value31, value12, value33
0 value12
1 value33
dtype: object
id value
0 id_01 value11, value12, value13
1 id_02 value21, value22, value23
2 id_02 value31, value12, value33
**expected output**
id value
0 id_01 value11, value13
1 id_02 value21, value22, value23
2 id_02 value31, value12
CodePudding user response:
you need to split the string into a list, operate on the elements, and then join it back together:
delimiter = ", "
filtered_items = set(['value12', 'value33'])
df_temp["value"] = df_temp["value"].apply(
lambda x: delimiter.join(
item
for item in x.split(delimiter)
if item not in filtered_items
)
)
outputs:
id value
0 id_01 value11, value13
1 id_02 value21, value22, value23
2 id_02 value31
CodePudding user response:
Regarding what you had done, it wouldn't work because the logic tells it to check if the input string is in the smaller pd.Series rather than the objects in the series within the string. Also, if that were to be true (which it couldn't have been), you then would replace the whole string with ' '.
Slightly different approach but you can call a function within the lambda to abstract the logic. This way you can manipulate however you want:
Hope this helps!
def remove_vals(s):
separator = ', '
split = s.split(separator)
subset = [x for x in split if x not in string_to_remove]
return separator.join(subset)
df_temp['value'] = df_temp['value'].apply(lambda x: remove_vals(x) if any(s in x for s in string_to_remove) else x)
print(df_temp)
>>>
id value
0 id_01 value11, value13
1 id_02 value21, value22, value23
2 id_02 value31