Checking a Column of Lists Against Another Column of Lists and Returning Multiple Column Values-CodePudding

I am comparing the list values of sb_list against psr_list. If all list items from sb_list['ASINs'] are found in any of the lists from psr_list['Child ASIN'], sb_list['bucket'] is marked 'clean'. This part of the code is running fine...

What I am having trouble with is populating sb_list['Group']. If ['bucket'] is marked 'clean' then sb_list['Group'] should equal the corresponding psr_list['Group'] where the match was found.

I am attempting to run the function to check each list of sb_list['ASINs'] against psr_list['Group'], and return a tuple, with the first value of the tuple being clean/mixed for if a match was found, and the second value of the tuple being whatever psr_list['Group'] value is for the matching row.

This is similar to another question I asked a few weeks ago, but different enough to where I thought it deserved its own post.

Data:

import pandas as pd

list1 = [
    ['1', ['hi', 'there', '10', '14', '15']],
    ['2',  ['7', '13', '25', '46', '50']],
    ['3',  ['hello', 'du', '6', '19', '36']],
    ['4',  ['hi', '19', '24', '26', '29']]]

psr_list = pd.DataFrame(list1, columns =['Group', 'Child ASIN']) 

list2 = [
    ['a', ['hi', 'there']],
    ['r',  ['hello', 'du', 'th']],
    ['e',  ['hello', '9']],
    ['f',  ['hello', '6', '36']],
    ['w',  ['hello', '6', '37']],
    ['a',  ['24', '29']],
    ['q',  ['hi', '14', '15']]]

sb_list = pd.DataFrame(list2, columns =['camp', 'ASINs']) 
sb_list['bucket'] = ""
sb_list['Group'] = ""

My attempt:

def process(psr_asin_list, sb_ap_asin_list):
  return [compare(psr_asin_list, sb_sp_row) for sb_sp_row in sb_ap_asin_list]

def compare(psr_asin_list, sb_sp_row):
  counter = 0
  while counter < psr_asin_list.shape[0]:
    if all(asins in psr_asin_list[counter] for asins in sb_sp_row): return ('clean', psr_asin_list['Group'])
    counter  =1
  return ('mixed', '')


sb_list['bucket'] = process(psr_list['Child ASIN'].to_numpy(), sb_list['ASINs'].to_numpy())[0]
sb_list['Group'] = process(psr_list['Child ASIN'].to_numpy(), sb_list['ASINs'].to_numpy())[1]

Desired output:

  camp            ASINs bucket Group
0    a      [hi, there]  clean     1
1    r  [hello, du, th]  mixed
2    e       [hello, 9]  mixed
3    f   [hello, 6, 36]  clean     3
4    w   [hello, 6, 37]  mixed
5    a         [24, 29]  clean     4
6    q     [hi, 14, 15]  clean     1

CodePudding user response：

You could use set.issubset in a list comprehension to check if any list in sb_list is contained in any list in psr_list. If a list exists, then get "Group" value where it exists, if not fill in with "". Note that this assumes only one list in psr_list contains a list from sb_list.

Then fill in bucket depending on if a "Group" value was found or not:

def get_group(asin):
    group = psr_list.loc[[set(asin).issubset(y) for y in psr_list['Child ASIN'].tolist()], 'Group']
    return group.iat[0] if not group.empty else ''

sb_list['Group'] = sb_list['ASINs'].apply(get_group)
sb_list['bucket'] = np.where(sb_list['Group']=='', 'mixed', 'clean')

Output:

  camp            ASINs bucket Group
0    a      [hi, there]  clean     1
1    r  [hello, du, th]  mixed      
2    e       [hello, 9]  mixed      
3    f   [hello, 6, 36]  clean     3
4    w   [hello, 6, 37]  mixed      
5    a         [24, 29]  clean     4
6    q     [hi, 14, 15]  clean     1