I have the following df:
ex_df = pd.DataFrame({'ID': {0: 1, 1: 2, 2: 3, 3: 4}, 'Item': {0: {1}, 1: {1, 2}, 2: {1, 3, 4}, 3: {1, 3}}})
- Package 1 is when there is only item 1
- Package 2 is when there are items 1, 2
- Package 3 is when there are items 1, 3, 4
- Package 4 is when there are items 1, 3
I am trying to find a way to identify the type of a package given the set of items in each row.
so the result df should be:
ex_df = pd.DataFrame({'ID': {0: 1, 1: 2, 2: 3, 3: 4}, 'Item': {0: {1}, 1: {1, 2}, 2: {1, 3, 4}, 3: {1, 3}}, 'Package': {0: 'Package1', 1: 'Package2', 2: 'Package3', 3: 'Package4'}})
Can someone please point me to the right direction?
CodePudding user response:
You can use a dictionary of frozenset
to map
the values:
d = {frozenset({1}): 'Package1',
frozenset({1, 2}): 'Package2',
frozenset({1, 3, 4}): 'Package3',
frozenset({1, 3}): 'Package4'}
ex_df['Package'] = ex_df['Item'].apply(frozenset).map(d)
output:
ID Item Package
0 1 {1} Package1
1 2 {1, 2} Package2
2 3 {1, 3, 4} Package3
3 4 {1, 3} Package4
alternative: largest subset if no match:
ex_df['Package'] = ex_df['Item'].apply(frozenset).map(d)
m = ex_df['Package'].isna()
sets = sorted(d, key=lambda x: -len(x))
ex_df.loc[m, 'Package'] = [d.get(next((s for s in sets if x.issuperset(s)), None))
for x in ex_df.loc[m, 'Item']]
Example:
ID Item Package
0 1 {1} Package1
1 2 {1, 2} Package2
2 3 {1, 3, 4} Package3
3 4 {1, 3} Package4
4 5 {1, 5, 6} Package1