Home > other >  Is there any possibility to filter pandas by list?
Is there any possibility to filter pandas by list?

Time:04-10

I have a df with columns product_id and keyword. I am iterating over product id by for group_name, group in s.groupby('product_id') and in this step, I want to keep groups that contain all elements from the search list. for example: enter image description here

and my search list = ['land', 'cruiser']

so in the output, I want this:

enter image description here

I have tried this

data[data["keywords"].isin(search_words)]

but this method keeps all groups if it contains at least one element from the list, while I groups must contain all elements in the search list.

CodePudding user response:

The core logic here is lifted from S Rawson's answer.

Instead of a one-liner, I create some helper variables -- I make a series of product_id and all their associated search terms (product_keywords). Then I make a series (keepers) of all the product_id that have all the keywords. Finally I use the isin() method to test if dataframe rows have a product_id in that list.

This keeps all terms that are in groups that have all the sought terms; so if group 1 has "land" and "cruiser" and also other words, this gets all of the terms.

df = pd.DataFrame({"product_id": [1,1, 2, 3, 1, 2, 3],
                   "keywords": ["land", "barn", "land", "land", "cruiser", "barn", "cruiser"]})

search_list = set(['land', 'cruiser'])

product_keywords = df.groupby("product_id").apply(lambda x: list(set(x["keywords"])))

product_keywords
# product_id
# 1    [barn, cruiser, land]
# 2             [barn, land]
# 3          [cruiser, land]

keepers = product_keywords[product_keywords.apply(lambda found_terms: all(search_term in found_terms for search_term in search_list))]

keepers
# product_id
# 1    [barn, cruiser, land]
# 3          [cruiser, land]


df.loc[df['product_id'].isin(keepers.index)]  
#    product_id keywords
# 0           1     land
# 1           1     barn
# 3           3     land
# 4           1  cruiser
# 6           3  cruiser

CodePudding user response:

For a different answer (in one line!):

import pandas as pd
s = pd.DataFrame({'product_id': [1, 1, 1, 1, 1, 2, 2, 3, 3, 4, 4],
                  'keywords': ["land", "cruiser", "land", "cruiser", "land",
                               "land", "land", "cruiser", "cruiser", "land", "cruiser"]})

search_words = ['land', 'cruiser']

s[[(all(p in x for p in search_words)) \
   for x in s.merge(s.groupby("product_id").apply(
           lambda x: list(set(x["keywords"]))).to_frame(). \
               rename(columns={0: "grouped"}), left_on="product_id",
               right_index=True)["grouped"]]]
#Out: 
#    product_id keywords
#0            1     land
#1            1  cruiser
#2            1     land
#3            1  cruiser
#4            1     land
#9            4     land
#10           4  cruiser

EDIT

I have just used pandas groupby filtration, and realised it would be much more simple to use this here instead. The returned dataframe is exactly the same as the one above, just using a lot less code!

s.groupby("product_id").filter(lambda x: all(p in list(set(x["keywords"])) for p in search_words))
  • Related