Home > Blockchain >  Union of unique list from two and three columns of a dataframe
Union of unique list from two and three columns of a dataframe

Time:08-30

How do I get the union of unique list from two columns and three columns of a dataframe?

This is the dataframe I am working with:

Col1 Extract              Col2 Extract           Col3 Extract      
------------              ------------           ------------
['unclassified']          ['sink', 'fridge']     ['unclassified']
['fridge', 'microwave']   ['fridge', 'stove']    ['sink']          
['unclassified']          ['unclassified']       ['unclassified']

What I would like is the union of unique list for ('Col1 Extract' 'Col2 Extract') and ('Col1 Extract' 'Col2 Extract' 'Col3 Extract') using pandas way. This is what I'm looking for:

Col1 Col2                             Col1 Col2 Col3
------------                          ---------------             
['unclassified', 'sink', 'fridge']    ['unclassified', 'sink', 'fridge']      
['fridge', 'microwave', 'stove']      ['fridge', 'microwave', 'stove', 'sink']          
['unclassified']                      ['unclassified']  

CodePudding user response:

Join columns and remove duplicates by sets:

df['Col1 Col2'] = (df['Col1 Extract']   df['Col2 Extract']).apply(lambda x: list(set(x)))
df['Col1 Col2 Col3'] = (df['Col1 Extract']   df['Col2 Extract']   df['Col3 Extract']).apply(lambda x: list(set(x)))
print (df)
          Col1 Extract     Col2 Extract    Col3 Extract  \
0       [unclassified]   [sink, fridge]  [unclassified]   
1  [fridge, microwave]  [fridge, stove]          [sink]   
2       [unclassified]   [unclassified]  [unclassified]   

                      Col1 Col2                    Col1 Col2 Col3  
0  [fridge, unclassified, sink]      [fridge, unclassified, sink]  
1    [stove, fridge, microwave]  [stove, fridge, microwave, sink]  
2                [unclassified]                    [unclassified] 

If ordering is important use dict.fromkeys trick:

df['Col1 Col2'] = (df['Col1 Extract']   df['Col2 Extract']).apply(lambda x: list(dict.fromkeys(x)))
df['Col1 Col2 Col3'] = (df['Col1 Extract']   df['Col2 Extract']   df['Col3 Extract']).apply(lambda x: list(dict.fromkeys(x)))
print (df)
          Col1 Extract     Col2 Extract    Col3 Extract  \
0       [unclassified]   [sink, fridge]  [unclassified]   
1  [fridge, microwave]  [fridge, stove]          [sink]   
2       [unclassified]   [unclassified]  [unclassified]   

                      Col1 Col2                    Col1 Col2 Col3  
0  [unclassified, sink, fridge]      [unclassified, sink, fridge]  
1    [fridge, microwave, stove]  [fridge, microwave, stove, sink]  
2                [unclassified]                    [unclassified]  
  • Related