I have a table which is organized around transactions which may contain the link to one or serveral people.
My intention is to create a network view for which I need to create bi-directional edges for visualization. Is there a way to organize the table so that it looks like this:
Thanks for your help!
I tried if I could solve the issue by transposing and grouping the table differently. But it doesn't solve my issue of bi-directional relations.
CodePudding user response:
We could use itertools.combinations
to get unique pairs of Person
grouped by Transaction
:
import pandas as pd
from itertools import combinations
data = {
'Transaction': [101,101,101,102,103,103],
'Person': [1,2,3,3,5,6]
}
df = pd.DataFrame(data)
pairs = (
df
.groupby('Transaction')['Person']
.apply(lambda gr: pd.DataFrame(combinations(gr, 2), columns=['A','B']))
.reset_index('Transaction')
.reset_index(drop=True)
)
print(pairs)
Output:
Transaction A B
0 101 1 2
1 101 1 3
2 101 2 3
3 103 5 6
Notes:
- The
data
parameter of DataFrame can be an iterable object, so we can passcombinations(...)
topd.DataFrame(...)
as is. - GroupBy.apply will stack together the output of the same structure into one frame.
CodePudding user response:
Assume dataset:
import pandas as pd
df = pd.DataFrame(data={
"txn": [1,1,1,2,3,3],
"person": ["a", "b", "c", "c", "d", "e"],
})
Group by transactions to get persons involved in it:
df_grp = df.groupby("txn").agg(person_list=("person", list))
>> person_list
>> txn
>> 1 [a, b, c]
>> 2 [c]
>> 3 [d, e]
Create pair combinations from person list:
import itertools
df_grp["combinations"] = df_grp.apply(lambda row: list(itertools.combinations(row["person_list"], 2)), axis=1)
>> person_list combinations
>> txn
>> 1 [a, b, c] [(a, b), (a, c), (b, c)]
>> 2 [c] []
>> 3 [d, e] [(d, e)]
Move combinations to individual rows:
df_expl = df_grp[["combinations"]].explode("combinations").dropna().reset_index()
>> txn combinations
>> 0 1 (a, b)
>> 1 1 (a, c)
>> 2 1 (b, c)
>> 3 3 (d, e)
Split person pair to individual columns:
df_expl["A"] = df_expl.apply(lambda row: row["combinations"][0], axis=1)
df_expl["B"] = df_expl.apply(lambda row: row["combinations"][1], axis=1)
>> txn A B
>> 0 1 a b
>> 1 1 a c
>> 2 1 b c
>> 3 3 d e