Home > Back-end >  Create a dataframe of all combinations of columns names per row based on mutual presence of columns
Create a dataframe of all combinations of columns names per row based on mutual presence of columns

Time:10-11

I'm trying to create a dataframe based on other dataframe and a specific condition.

First dataframe

Given the pandas dataframe above, I'd like to have a two column dataframe, which each row would be the combinations of pairs of words that are different from 0 (coexist in a specific row), beginning with the first row.

For example, for this part of image above, the new dataframe that I want is like de following:

dataframe wanted

and so on...

Does anyone have some tip of how I can do it? I'm struggling... Thanks!

CodePudding user response:

As you didn't provide a text example, here is a dummy one:

>>> df
   A  B  C  D  E
0  0  1  1  0  1
1  1  1  1  1  1
2  1  0  0  1  0
3  0  0  0  0  1
4  0  1  1  0  0

you could use a combination of masking, explode and itertools.combinations:

from itertools import combinations
mask = df.gt(0)
series = (mask*df.columns).apply(lambda x: list(combinations(set(x).difference(['']), r=2)), axis=1)
pd.DataFrame(series.explode().dropna().to_list(), columns=['X', 'Y'])

output:

    X  Y
0   C  E
1   C  B
2   E  B
3   E  D
4   E  C
5   E  B
6   E  A
7   D  C
8   D  B
9   D  A
10  C  B
11  C  A
12  B  A
13  A  D
14  C  B
  • Related