I have a problem similar to this question but an opposite challenge. Instead of having a removal list, I have a keep list - a list of strings I'd like to keep. My question is how to use a keep list to filter out the unwanted strings and retain the wanted ones in the column.
import pandas as pd
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Mitty, Kitty",
"Kandy, Puppy",
"Judy, Micky, Loudy",
"Cindy, Judy",
"Kitty, Wicky",
],
}
)
ID name
0 1 Mitty, Kitty
1 2 Kandy, Puppy
2 3 Judy, Micky, Loudy
3 4 Cindy, Judy
4 5 Kitty, Wicky
To_keep_lst = ["Kitty", "Kandy", "Micky", "Loudy", "Wicky"]
CodePudding user response:
Use Series.str.findall
with Series.str.join
:
To_keep_lst = ["Kitty", "Kandy", "Micky", "Loudy", "Wicky"]
df['name'] = df['name'].str.findall('|'.join(To_keep_lst)).str.join(', ')
print (df)
ID name
0 1 Kitty
1 2 Kandy
2 3 Micky, Loudy
3 4
4 5 Kitty, Wicky
CodePudding user response:
Use a comprehension to filter out names you want to keep:
keep_names = lambda x: ', '.join([n for n in x.split(', ') if n in To_keep_lst])
df['name'] = df['name'].apply(keep_names)
print(df)
# Output:
ID name
0 1 Kitty
1 2 Kandy
2 3 Micky, Loudy
3 4
4 5 Kitty, Wicky
Note: the answer of @jezrael is much faster than mine.