I want to apply a function that returns the elements not found in a reference list. What I want to get is the following.
import pandas as pd
product_list = ['Chive & Garlic', 'The Big Smoke',
'Jalapeno & Lemon', 'Spinach & Artichoke']
data = [['ACTIVE BODY', ['Chive & Garlic', 'The Big Smoke'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
['AG VALLEY FOODS', ['Chive & Garlic', 'Spinach & Artichoke'], ['The Big Smoke', 'Jalapeno & Lemon']],
['ALIM MICHEL HALLORAN', ['The Big Smoke', 'Chive & Garlic'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
['ALIMENTATION IAN DES', ['The Big Smoke', 'Jalapeno & Lemon'],['Chive & Garlic', 'Spinach & Artichoke']]]
df = pd.DataFrame(data, columns=['store', 'products', 'missing_products'])
where missing_products
are the products in list type, not found in the array of the products
column
I tried the following function but it's not working as intended
def gap(row):
for item in product_list:
if item not in row:
return item
Important to note that each value in the products
column is an array, not list of strings. Not sure if this affects something.
[['ACADEMIE DU GOURMET ACADEMY INC', array([nan], dtype=object)],
['ACTIVE BODY',
array(['Chive & Garlic', 'Garlic Tzatziki', 'The Big Smoke'], dtype=object)],
['AG VALLEY FOODS',
array(['Chive & Garlic', 'Spinach & Artichoke'], dtype=object)],
['ALIM MICHEL HALLORAN',
array(['The Meadow', 'The Big Smoke', 'Chive & Garlic',
'Jalapeno & Lemon', 'Dill & Truffle'], dtype=object)],
['ALIMENTATION IAN DES',
array(['The Big Smoke', 'Jalapeno & Lemon'], dtype=object)]]
Thanks in advance for the help!
CodePudding user response:
Use:
def gap(row):
out = []
for item in product_list:
if item not in row:
out.append(item)
return out
Alternative:
def gap(row):
return [item for item in product_list if item not in row]
df['missing_products1'] = df['products'].apply(gap)
List comprehension solution:
df['missing_products1'] = [[item for item in product_list if item not in row] for row in df['products']]
CodePudding user response:
Maybe you can create the dataframe as a binary data frame where if the store has the product you put 1 and if not you put 0 that way even for later it can be better for whatever application instead of lists in the dataframe