Home > OS >  Is there a way to split m:n relations in a pandas dataframe to get bi-directional relations?
Is there a way to split m:n relations in a pandas dataframe to get bi-directional relations?

Time:09-10

I have a table which is organized around transactions which may contain the link to one or serveral people.

enter image description here

My intention is to create a network view for which I need to create bi-directional edges for visualization. Is there a way to organize the table so that it looks like this:

enter image description here

Thanks for your help!

I tried if I could solve the issue by transposing and grouping the table differently. But it doesn't solve my issue of bi-directional relations.

CodePudding user response:

We could use itertools.combinations to get unique pairs of Person grouped by Transaction:

import pandas as pd
from itertools import combinations

data = {
    'Transaction': [101,101,101,102,103,103],
    'Person': [1,2,3,3,5,6]
}

df = pd.DataFrame(data)

pairs = (
    df
    .groupby('Transaction')['Person']
    .apply(lambda gr: pd.DataFrame(combinations(gr, 2), columns=['A','B']))
    .reset_index('Transaction')
    .reset_index(drop=True)
)

print(pairs)

Output:

   Transaction  A  B
0          101  1  2
1          101  1  3
2          101  2  3
3          103  5  6

Notes:

  1. The data parameter of DataFrame can be an iterable object, so we can pass combinations(...) to pd.DataFrame(...) as is.
  2. GroupBy.apply will stack together the output of the same structure into one frame.

CodePudding user response:

Assume dataset:

import pandas as pd
df = pd.DataFrame(data={
    "txn": [1,1,1,2,3,3],
    "person": ["a", "b", "c", "c", "d", "e"],
})

Group by transactions to get persons involved in it:

df_grp = df.groupby("txn").agg(person_list=("person", list))

>>     person_list
>> txn            
>> 1     [a, b, c]
>> 2           [c]
>> 3        [d, e]

Create pair combinations from person list:

import itertools
df_grp["combinations"] = df_grp.apply(lambda row: list(itertools.combinations(row["person_list"], 2)), axis=1)

>>     person_list              combinations
>> txn                                      
>> 1     [a, b, c]  [(a, b), (a, c), (b, c)]
>> 2           [c]                        []
>> 3        [d, e]                  [(d, e)]

Move combinations to individual rows:

df_expl = df_grp[["combinations"]].explode("combinations").dropna().reset_index()

>>    txn combinations
>> 0    1       (a, b)
>> 1    1       (a, c)
>> 2    1       (b, c)
>> 3    3       (d, e)

Split person pair to individual columns:

df_expl["A"] = df_expl.apply(lambda row: row["combinations"][0], axis=1)
df_expl["B"] = df_expl.apply(lambda row: row["combinations"][1], axis=1)

>>    txn  A  B
>> 0    1  a  b
>> 1    1  a  c
>> 2    1  b  c
>> 3    3  d  e
  • Related