I'm trying to create a dataframe based on other dataframe and a specific condition.
Given the pandas dataframe above, I'd like to have a two column dataframe, which each row would be the combinations of pairs of words that are different from 0 (coexist in a specific row), beginning with the first row.
For example, for this part of image above, the new dataframe that I want is like de following:
and so on...
Does anyone have some tip of how I can do it? I'm struggling... Thanks!
CodePudding user response:
As you didn't provide a text example, here is a dummy one:
>>> df
A B C D E
0 0 1 1 0 1
1 1 1 1 1 1
2 1 0 0 1 0
3 0 0 0 0 1
4 0 1 1 0 0
you could use a combination of masking, explode
and itertools.combinations
:
from itertools import combinations
mask = df.gt(0)
series = (mask*df.columns).apply(lambda x: list(combinations(set(x).difference(['']), r=2)), axis=1)
pd.DataFrame(series.explode().dropna().to_list(), columns=['X', 'Y'])
output:
X Y
0 C E
1 C B
2 E B
3 E D
4 E C
5 E B
6 E A
7 D C
8 D B
9 D A
10 C B
11 C A
12 B A
13 A D
14 C B