Home > OS >  Pandas Drop Rows when a String is Matched to a Longer String in a Column in an Exact Match
Pandas Drop Rows when a String is Matched to a Longer String in a Column in an Exact Match

Time:02-11

I'm trying to drop rows in a pandas DataFrame if a substring in a column exactly matches a string in a list. At the moment I can only get it working for partial matches.

# list of strings to drop in an exact match
drop_list = ["sock", "shirt"]

# initialize data of lists.
data = {'keyword': ['adidas socks', 'adidas sock', 'adidas shoes', "sock"]}

# Create DataFrame
df = pd.DataFrame(data)

df = df[~df['keyword'].str.contains("|".join(drop_list))]

Current Output:

        keyword
2  adidas shoes

Desired Output:

        keyword
0  adidas socks
1  adidas shoes

CodePudding user response:

You can create a set from drop_list and use set.isdisjoint on the split words in each row to evaluate if the exact match appears.

drop_set = set(drop_list)
msk = df['keyword'].apply(lambda x: drop_set.isdisjoint(x.split()))
df = df[msk]

Output:

        keyword
0  adidas socks
2  adidas shoes

CodePudding user response:

Your code seems to be working. The only thing I noticed is that the index is not "updated"

To achieve that we could reset the index:

df = df.reset_index(drop=True)
#or
df.reset_index(drop=True, inplace=True)

I used the sample code you provided but my output looks different than yours but so does the input.
Honestly, I don't understand why that last empty string of data is not showing up on your output.
Input:

        keyword
0   adidas sock
1  adidas socks
2  adidas shoes
3          sock
4

Output:

        keyword
0  adidas shoes
1
  • Related