How do I get the union of unique list from two columns and three columns of a dataframe?
This is the dataframe I am working with:
Col1 Extract Col2 Extract Col3 Extract
------------ ------------ ------------
['unclassified'] ['sink', 'fridge'] ['unclassified']
['fridge', 'microwave'] ['fridge', 'stove'] ['sink']
['unclassified'] ['unclassified'] ['unclassified']
What I would like is the union of unique list for ('Col1 Extract' 'Col2 Extract') and ('Col1 Extract' 'Col2 Extract' 'Col3 Extract') using pandas way. This is what I'm looking for:
Col1 Col2 Col1 Col2 Col3
------------ ---------------
['unclassified', 'sink', 'fridge'] ['unclassified', 'sink', 'fridge']
['fridge', 'microwave', 'stove'] ['fridge', 'microwave', 'stove', 'sink']
['unclassified'] ['unclassified']
CodePudding user response:
Join columns and remove duplicates by set
s:
df['Col1 Col2'] = (df['Col1 Extract'] df['Col2 Extract']).apply(lambda x: list(set(x)))
df['Col1 Col2 Col3'] = (df['Col1 Extract'] df['Col2 Extract'] df['Col3 Extract']).apply(lambda x: list(set(x)))
print (df)
Col1 Extract Col2 Extract Col3 Extract \
0 [unclassified] [sink, fridge] [unclassified]
1 [fridge, microwave] [fridge, stove] [sink]
2 [unclassified] [unclassified] [unclassified]
Col1 Col2 Col1 Col2 Col3
0 [fridge, unclassified, sink] [fridge, unclassified, sink]
1 [stove, fridge, microwave] [stove, fridge, microwave, sink]
2 [unclassified] [unclassified]
If ordering is important use dict.fromkeys
trick:
df['Col1 Col2'] = (df['Col1 Extract'] df['Col2 Extract']).apply(lambda x: list(dict.fromkeys(x)))
df['Col1 Col2 Col3'] = (df['Col1 Extract'] df['Col2 Extract'] df['Col3 Extract']).apply(lambda x: list(dict.fromkeys(x)))
print (df)
Col1 Extract Col2 Extract Col3 Extract \
0 [unclassified] [sink, fridge] [unclassified]
1 [fridge, microwave] [fridge, stove] [sink]
2 [unclassified] [unclassified] [unclassified]
Col1 Col2 Col1 Col2 Col3
0 [unclassified, sink, fridge] [unclassified, sink, fridge]
1 [fridge, microwave, stove] [fridge, microwave, stove, sink]
2 [unclassified] [unclassified]