Home > Net >  Check multiple conditions given the set of items in each row and assign a value under new column
Check multiple conditions given the set of items in each row and assign a value under new column

Time:10-18

I have the following df:

ex_df = pd.DataFrame({'ID': {0: 1, 1: 2, 2: 3, 3: 4}, 'Item': {0: {1}, 1: {1, 2}, 2: {1, 3, 4}, 3: {1, 3}}}) 
  • Package 1 is when there is only item 1
  • Package 2 is when there are items 1, 2
  • Package 3 is when there are items 1, 3, 4
  • Package 4 is when there are items 1, 3

I am trying to find a way to identify the type of a package given the set of items in each row.

so the result df should be:

ex_df = pd.DataFrame({'ID': {0: 1, 1: 2, 2: 3, 3: 4}, 'Item': {0: {1}, 1: {1, 2}, 2: {1, 3, 4}, 3: {1, 3}}, 'Package': {0: 'Package1', 1: 'Package2', 2: 'Package3', 3: 'Package4'}})

Can someone please point me to the right direction?

CodePudding user response:

You can use a dictionary of frozenset to map the values:

d = {frozenset({1}): 'Package1',
     frozenset({1, 2}): 'Package2',
     frozenset({1, 3, 4}): 'Package3',
     frozenset({1, 3}): 'Package4'}

ex_df['Package'] = ex_df['Item'].apply(frozenset).map(d)

output:

   ID       Item   Package
0   1        {1}  Package1
1   2     {1, 2}  Package2
2   3  {1, 3, 4}  Package3
3   4     {1, 3}  Package4

alternative: largest subset if no match:

ex_df['Package'] = ex_df['Item'].apply(frozenset).map(d)

m = ex_df['Package'].isna()

sets = sorted(d, key=lambda x: -len(x))

ex_df.loc[m, 'Package'] = [d.get(next((s for s in sets if x.issuperset(s)), None))
                           for x in ex_df.loc[m, 'Item']]

Example:

   ID       Item   Package
0   1        {1}  Package1
1   2     {1, 2}  Package2
2   3  {1, 3, 4}  Package3
3   4     {1, 3}  Package4
4   5  {1, 5, 6}  Package1
  • Related