I am comparing the list values of sb_list
against psr_list
. If all list items from sb_list['ASINs']
are found in any of the lists from psr_list['Child ASIN']
, sb_list['bucket']
is marked 'clean'
. This part of the code is running fine...
What I am having trouble with is populating sb_list['Group']
. If ['bucket']
is marked 'clean'
then sb_list['Group']
should equal the corresponding psr_list['Group']
where the match was found.
I am attempting to run the function to check each list of sb_list['ASINs']
against psr_list['Group']
, and return a tuple, with the first value of the tuple being clean/mixed for if a match was found, and the second value of the tuple being whatever psr_list['Group']
value is for the matching row.
This is similar to another question I asked a few weeks ago, but different enough to where I thought it deserved its own post.
Data:
import pandas as pd
list1 = [
['1', ['hi', 'there', '10', '14', '15']],
['2', ['7', '13', '25', '46', '50']],
['3', ['hello', 'du', '6', '19', '36']],
['4', ['hi', '19', '24', '26', '29']]]
psr_list = pd.DataFrame(list1, columns =['Group', 'Child ASIN'])
list2 = [
['a', ['hi', 'there']],
['r', ['hello', 'du', 'th']],
['e', ['hello', '9']],
['f', ['hello', '6', '36']],
['w', ['hello', '6', '37']],
['a', ['24', '29']],
['q', ['hi', '14', '15']]]
sb_list = pd.DataFrame(list2, columns =['camp', 'ASINs'])
sb_list['bucket'] = ""
sb_list['Group'] = ""
My attempt:
def process(psr_asin_list, sb_ap_asin_list):
return [compare(psr_asin_list, sb_sp_row) for sb_sp_row in sb_ap_asin_list]
def compare(psr_asin_list, sb_sp_row):
counter = 0
while counter < psr_asin_list.shape[0]:
if all(asins in psr_asin_list[counter] for asins in sb_sp_row): return ('clean', psr_asin_list['Group'])
counter =1
return ('mixed', '')
sb_list['bucket'] = process(psr_list['Child ASIN'].to_numpy(), sb_list['ASINs'].to_numpy())[0]
sb_list['Group'] = process(psr_list['Child ASIN'].to_numpy(), sb_list['ASINs'].to_numpy())[1]
Desired output:
camp ASINs bucket Group
0 a [hi, there] clean 1
1 r [hello, du, th] mixed
2 e [hello, 9] mixed
3 f [hello, 6, 36] clean 3
4 w [hello, 6, 37] mixed
5 a [24, 29] clean 4
6 q [hi, 14, 15] clean 1
CodePudding user response:
You could use set.issubset
in a list comprehension to check if any list in sb_list
is contained in any list in psr_list
. If a list exists, then get "Group" value where it exists, if not fill in with ""
. Note that this assumes only one list in psr_list
contains a list from sb_list
.
Then fill in bucket
depending on if a "Group" value was found or not:
def get_group(asin):
group = psr_list.loc[[set(asin).issubset(y) for y in psr_list['Child ASIN'].tolist()], 'Group']
return group.iat[0] if not group.empty else ''
sb_list['Group'] = sb_list['ASINs'].apply(get_group)
sb_list['bucket'] = np.where(sb_list['Group']=='', 'mixed', 'clean')
Output:
camp ASINs bucket Group
0 a [hi, there] clean 1
1 r [hello, du, th] mixed
2 e [hello, 9] mixed
3 f [hello, 6, 36] clean 3
4 w [hello, 6, 37] mixed
5 a [24, 29] clean 4
6 q [hi, 14, 15] clean 1