I have a dataframe like this:
data = {'col1':[['A'],['B'],['C'],['D']],
'col2':[['foo1','foo2'],['foo1','bar1'],['bar1'],['bar1','bar2','bar3']]}
df= pd.DataFrame(data)
---- ------------------
|col1| col2|
---- ------------------
| [A]| [foo1, foo2]|
| [B]| [foo1, bar1]|
| [C]| [bar1]|
| [D]|[bar1, bar2, bar3]|
---- ------------------
I'd like to find all combinations of lists for column col1
and also combine their values from col2
. As a result, I would like to get such a dataframe:
---------- ----------------------------------
|comb_col1 |comb_col2 |
---------- ----------------------------------
|[[A], [B]]|[[foo1, foo2], [foo1, bar1]] |
|[[A], [C]]|[[foo1, foo2], [bar1]] |
|[[A], [D]]|[[foo1, foo2], [bar1, bar2, bar3]]|
|[[B], [C]]|[[foo1, bar1], [bar1]] |
|[[B], [D]]|[[foo1, bar1], [bar1, bar2, bar3]]|
|[[C], [D]]|[[bar1], [bar1, bar2, bar3]] |
---------- ----------------------------------
Ideally, I would like to intersect the internal lists for combo_sol2, and get like this
---------- ---------------------------------- ---------
|comb_col1 |comb_col2 |intersect|
---------- ---------------------------------- ---------
|[[A], [B]]|[[foo1, foo2], [foo1, bar1]] |[foo1] |
|[[A], [C]]|[[foo1, foo2], [bar1]] |[] |
|[[A], [D]]|[[foo1, foo2], [bar1, bar2, bar3]]|[] |
|[[B], [C]]|[[foo1, bar1], [bar1]] |[bar1] |
|[[B], [D]]|[[foo1, bar1], [bar1, bar2, bar3]]|[bar1] |
|[[C], [D]]|[[bar1], [bar1, bar2, bar3]] |[bar1] |
---------- ---------------------------------- ---------
CodePudding user response:
from itertools import combinations
df1=df.apply(lambda x: list(combinations(x, 2)))
df1['intersect']=df1['col2'].apply(lambda x:list(set(x[0]).intersection(x[1])))
df1
col1 col2 intersect
0 ([A], [B]) ([foo1, foo2], [foo1, bar1]) [foo1]
1 ([A], [C]) ([foo1, foo2], [bar1]) []
2 ([A], [D]) ([foo1, foo2], [bar1, bar2, bar3]) []
3 ([B], [C]) ([foo1, bar1], [bar1]) [bar1]
4 ([B], [D]) ([foo1, bar1], [bar1, bar2, bar3]) [bar1]
5 ([C], [D]) ([bar1], [bar1, bar2, bar3]) [bar1]