I am working on building a pandas data frame to make a network graph. My data frame currently looks like this:
Group1 Group2 Group3 Asset
A A A MI
A A A JI
A A A MI
A A A JI
A A A MI
A A B MI
A A C MI
A A C PA
A A C MI
A A C PA
A A C MI
A A C PA
A A C MI
A A C PA
A A C MI
and I want to manipulate it to look like this:
Source Target Weight
MI JI 1
MI MI 1
MI PA 1
JI MI 1
PA MI 1
Essentially I would like to break up the Asset column into 2 columns of each possible combination for every unique pairing of Group 1, 2, and 3. The first part of the pairing would make up the Source column and the 2nd part of the pairing would make up the Target column. And for this minimal example, it's just group 3 that varies. The weight column represents the sum of total pairings of individual assets.
Can someone help point me in the right direction? Any help would be appreciated!
Here is some sample code to generate the df:
df = pd.DataFrame({'Group1': ['A','A','A','A', 'A','A','A', 'A','A','A', 'A','A','A', 'A','A'],
'Group2': ['A','A','A','A', 'A','A','A', 'A','A','A', 'A','A','A', 'A','A'],
'Group3': ['A','A','A','A','A','B','C','C','C','C','C','C','C','C','C'],
'Asset': ['MI','JI','MI','JI','MI','MI','MI','PA','MI','PA','MI','PA','MI','PA','MI']
})
CodePudding user response:
import itertools as it
import pandas as pd
df = pd.DataFrame({'Group1': ['A','A','A','A', 'A','A','A', 'A','A','A', 'A','A','A', 'A','A'],
'Group2': ['A','A','A','A', 'A','A','A', 'A','A','A', 'A','A','A', 'A','A'],
'Group3': ['A','A','A','A','A','B','C','C','C','C','C','C','C','C','C'],
'Asset': ['MI','JI','MI','JI','MI','MI','MI','PA','MI','PA','MI','PA','MI','PA','MI']
})
def assets_pairs(assets_group):
unique_assets = set(assets_group)
if len(unique_assets) == 1:
x = assets_group.iat[0] # get the only unique asset
pairs = [[x, x]]
else:
pairs = it.permutations(unique_assets, r=2) # get all the unique pairs without repeated elements
return pd.DataFrame(pairs, columns=['Source', 'Target'])
df_pairs = (
df.groupby(['Group1', 'Group2', 'Group3'])['Asset']
.apply(assets_pairs) # create asset pairs per group
.groupby(['Source', 'Target'], as_index=False) # compute the weights by
.agg(Weights = ('Source', 'size')) # counting the unique ('Source', 'Target') pairs
)
>>> df_pairs
Source Target Weights
0 JI MI 1
1 MI JI 1
2 MI MI 1
3 MI PA 1
4 PA MI 1
CodePudding user response:
You can iterate over a transposed view of the data:
from collections import defaultdict
d, v = [*zip(*[df[i] for i in df])], []
for i in range(len(d)-1):
if (k:=(d[i][-1], d[i 1][-1])) not in v:
v.append(k)
r = [tuple([*k, 1]) for k in v]
Output:
[('MI', 'JI', 1), ('JI', 'MI', 1), ('MI', 'MI', 1), ('MI', 'PA', 1), ('PA', 'MI', 1)]