Home > Enterprise >  Find all possible combinations from a column and also combine their pandas values
Find all possible combinations from a column and also combine their pandas values

Time:04-26

I have a dataframe like this:

data = {'col1':[['A'],['B'],['C'],['D']],
       'col2':[['foo1','foo2'],['foo1','bar1'],['bar1'],['bar1','bar2','bar3']]}
df= pd.DataFrame(data)

 ---- ------------------ 
|col1|              col2|
 ---- ------------------ 
| [A]|      [foo1, foo2]|
| [B]|      [foo1, bar1]|
| [C]|            [bar1]|
| [D]|[bar1, bar2, bar3]|
 ---- ------------------ 

I'd like to find all combinations of lists for column col1 and also combine their values from col2. As a result, I would like to get such a dataframe:

 ---------- ---------------------------------- 
|comb_col1 |comb_col2                         |
 ---------- ---------------------------------- 
|[[A], [B]]|[[foo1, foo2], [foo1, bar1]]      |
|[[A], [C]]|[[foo1, foo2], [bar1]]            |
|[[A], [D]]|[[foo1, foo2], [bar1, bar2, bar3]]|
|[[B], [C]]|[[foo1, bar1], [bar1]]            |
|[[B], [D]]|[[foo1, bar1], [bar1, bar2, bar3]]|
|[[C], [D]]|[[bar1], [bar1, bar2, bar3]]      |
 ---------- ---------------------------------- 

Ideally, I would like to intersect the internal lists for combo_sol2, and get like this

 ---------- ---------------------------------- --------- 
|comb_col1 |comb_col2                         |intersect|
 ---------- ---------------------------------- --------- 
|[[A], [B]]|[[foo1, foo2], [foo1, bar1]]      |[foo1]   |
|[[A], [C]]|[[foo1, foo2], [bar1]]            |[]       |
|[[A], [D]]|[[foo1, foo2], [bar1, bar2, bar3]]|[]       |
|[[B], [C]]|[[foo1, bar1], [bar1]]            |[bar1]   |
|[[B], [D]]|[[foo1, bar1], [bar1, bar2, bar3]]|[bar1]   |
|[[C], [D]]|[[bar1], [bar1, bar2, bar3]]      |[bar1]   |
 ---------- ---------------------------------- --------- 

CodePudding user response:

from itertools import combinations

df1=df.apply(lambda x: list(combinations(x, 2)))
df1['intersect']=df1['col2'].apply(lambda x:list(set(x[0]).intersection(x[1])))

df1

         col1                                col2 intersect
0  ([A], [B])        ([foo1, foo2], [foo1, bar1])    [foo1]
1  ([A], [C])              ([foo1, foo2], [bar1])        []
2  ([A], [D])  ([foo1, foo2], [bar1, bar2, bar3])        []
3  ([B], [C])              ([foo1, bar1], [bar1])    [bar1]
4  ([B], [D])  ([foo1, bar1], [bar1, bar2, bar3])    [bar1]
5  ([C], [D])        ([bar1], [bar1, bar2, bar3])    [bar1]
  • Related