Home > database >  How to replace string in dataframe column based on another series?
How to replace string in dataframe column based on another series?

Time:08-04

I am looking to replace specific string in a dataframe column for each row. Below is my attempt. Can someone please help in getting the expected output as mentioned below?


df_temp = df_temp.append({'id' : 'id_01', 'value' : 'value11, value12, value13'}, ignore_index = True)
df_temp = df_temp.append({'id' : 'id_02', 'value' : 'value21, value22, value23'}, ignore_index = True)
df_temp = df_temp.append({'id' : 'id_02', 'value' : 'value31, value12, value33'}, ignore_index = True)
print(df_temp)

string_to_remove = pd.Series(['value12', 'value33'])

print(string_to_remove)
df_temp['value'] = df_temp['value'].apply(lambda x : '' if x in string_to_remove else x)
print(df_temp)```


**current output**

      id                      value
0  id_01  value11, value12, value13
1  id_02  value21, value22, value23
2  id_02  value31, value12, value33
0    value12
1    value33
dtype: object
      id                      value
0  id_01  value11, value12, value13
1  id_02  value21, value22, value23
2  id_02  value31, value12, value33

**expected output**

      id                      value
0  id_01  value11, value13
1  id_02  value21, value22, value23
2  id_02  value31, value12

CodePudding user response:

you need to split the string into a list, operate on the elements, and then join it back together:

delimiter = ", "
filtered_items = set(['value12', 'value33'])
df_temp["value"] = df_temp["value"].apply(
    lambda x: delimiter.join(
        item
        for item in x.split(delimiter)
        if item not in filtered_items
    )
)

outputs:

      id                      value
0  id_01           value11, value13
1  id_02  value21, value22, value23
2  id_02                    value31

CodePudding user response:

Regarding what you had done, it wouldn't work because the logic tells it to check if the input string is in the smaller pd.Series rather than the objects in the series within the string. Also, if that were to be true (which it couldn't have been), you then would replace the whole string with ' '.

Slightly different approach but you can call a function within the lambda to abstract the logic. This way you can manipulate however you want:

Hope this helps!

def remove_vals(s):
    separator = ', '
    split = s.split(separator)
    subset = [x for x in split if x not in string_to_remove]
    return separator.join(subset)


df_temp['value'] = df_temp['value'].apply(lambda x: remove_vals(x) if any(s in x for s in string_to_remove) else x)
print(df_temp)
>>>       
      id                      value
0  id_01           value11, value13
1  id_02  value21, value22, value23
2  id_02                    value31
  • Related