I have a dataframe containing some champions and their related traits, it looks like this:
Index | Champion | Traits |
---|---|---|
0 | Alphelios | 'Nightbringer', 'Ranger' |
1 | Ashe | 'Draconic', 'Ranger' |
2 | Heimerdinger | 'Renewer', 'Draconic', 'Caretaker' |
3 | Lee Sin | 'Nightbringer', 'Skirmisher' |
Where the Traits are represented as: ['Nightbringer', 'Ranger'], ['Draconic', 'Ranger'], ect.
... and so forth.
I wish to create an edgelist with the champions and those champions that has the same trait, as is;
Source | Target |
---|---|
0 | 3 |
0 | 1 |
1 | 2 |
.. The list goes on. I'd also like the last DataFrame to contain a column with the weights, for example if two champions have the same two traits, or even tree, then it will have a weight of 2 (3). I think the dataframe should be extended so that each champion has several rows (containing each of their trait), but I can't seem to find a solution to the problem. Can anybody help me out here? Thanks!
CodePudding user response:
IIUC, you need to count the number of other rows with matching traits.
Use a combination of str.get_dummies
and numpy:
NB. This assumes Traits are strings, if lists just get dummies from the list instead
import numpy as np
a = df.Traits.str.get_dummies(sep=',').values
b = a.dot(a.T)
np.fill_diagonal(b, 0)
pd.DataFrame({'Source': df['Index'],
'Target': b.sum(1)})
Output:
Source Target
0 0 2
1 1 1
2 2 0
3 3 1
CodePudding user response:
You can create a dict with the indexes of all occurrences of a trait:
my_dict = {}
for i, j in enumerate(df['Traits']):
for trait in j:
if trait in my_dict:
my_dict[trait].append(i)
else:
my_dict[trait] = [i]
print(my_dict)
The output:
{'Nightbringer': [0, 3], 'Ranger': [0, 1], 'Draconic': [1, 2], 'Renewer': [2], 'Caretaker': [2], 'Skirmisher': [3]}
This is a better approach because you don't get unnecessary repetition, e.g. 0 points to 3
and 3 points to 0