Home > Net >  Apply function to find elements not in a list
Apply function to find elements not in a list

Time:12-13

I want to apply a function that returns the elements not found in a reference list. What I want to get is the following.

import pandas as pd

product_list = ['Chive & Garlic', 'The Big Smoke',
                'Jalapeno & Lemon', 'Spinach & Artichoke']

data = [['ACTIVE BODY', ['Chive & Garlic', 'The Big Smoke'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
        ['AG VALLEY FOODS', ['Chive & Garlic', 'Spinach & Artichoke'], ['The Big Smoke', 'Jalapeno & Lemon']],
        ['ALIM MICHEL HALLORAN', ['The Big Smoke', 'Chive & Garlic'], ['Jalapeno & Lemon', 'Spinach & Artichoke']],
        ['ALIMENTATION IAN DES', ['The Big Smoke', 'Jalapeno & Lemon'],['Chive & Garlic', 'Spinach & Artichoke']]]

df = pd.DataFrame(data, columns=['store', 'products', 'missing_products'])

enter image description here

where missing_products are the products in list type, not found in the array of the products column

I tried the following function but it's not working as intended

def gap(row):
    for item in product_list:
        if item not in row:
            return item

Important to note that each value in the products column is an array, not list of strings. Not sure if this affects something.

[['ACADEMIE DU GOURMET ACADEMY INC', array([nan], dtype=object)],
 ['ACTIVE BODY',
  array(['Chive & Garlic', 'Garlic Tzatziki', 'The Big Smoke'], dtype=object)],
 ['AG VALLEY FOODS',
  array(['Chive & Garlic', 'Spinach & Artichoke'], dtype=object)],
 ['ALIM MICHEL HALLORAN',
  array(['The Meadow', 'The Big Smoke', 'Chive & Garlic',
         'Jalapeno & Lemon', 'Dill & Truffle'], dtype=object)],
 ['ALIMENTATION IAN DES',
  array(['The Big Smoke', 'Jalapeno & Lemon'], dtype=object)]]

Thanks in advance for the help!

CodePudding user response:

Use:

def gap(row):
    out = []
    for item in product_list:
        if item not in row:
            out.append(item)
    return out

Alternative:

def gap(row):
    return [item for item in product_list if item not in row]


df['missing_products1'] = df['products'].apply(gap)

List comprehension solution:

df['missing_products1'] = [[item for item in product_list if item not in row] for row in df['products']]

CodePudding user response:

Maybe you can create the dataframe as a binary data frame where if the store has the product you put 1 and if not you put 0 that way even for later it can be better for whatever application instead of lists in the dataframe

  • Related