I'm trying to drop rows in a pandas DataFrame if a substring in a column exactly matches a string in a list. At the moment I can only get it working for partial matches.
# list of strings to drop in an exact match
drop_list = ["sock", "shirt"]
# initialize data of lists.
data = {'keyword': ['adidas socks', 'adidas sock', 'adidas shoes', "sock"]}
# Create DataFrame
df = pd.DataFrame(data)
df = df[~df['keyword'].str.contains("|".join(drop_list))]
Current Output:
keyword
2 adidas shoes
Desired Output:
keyword
0 adidas socks
1 adidas shoes
CodePudding user response:
You can create a set from drop_list
and use set.isdisjoint
on the split words in each row to evaluate if the exact match appears.
drop_set = set(drop_list)
msk = df['keyword'].apply(lambda x: drop_set.isdisjoint(x.split()))
df = df[msk]
Output:
keyword
0 adidas socks
2 adidas shoes
CodePudding user response:
Your code seems to be working. The only thing I noticed is that the index is not "updated"
To achieve that we could reset the index:
df = df.reset_index(drop=True)
#or
df.reset_index(drop=True, inplace=True)
I used the sample code you provided but my output looks different than yours but so does the input.
Honestly, I don't understand why that last empty string of data is not showing up on your output.
Input:
keyword
0 adidas sock
1 adidas socks
2 adidas shoes
3 sock
4
Output:
keyword
0 adidas shoes
1