I have a table:
id | Name |
---|---|
1 | A |
2 | A |
2 | B |
2 | C |
3 | B |
3 | C |
4 | C |
4 | A |
And I need a new table where unique IDs are calculated at the intersection of names. That is, for example, ID 2 has both A and B, that is, at the intersection of A and B, there is already one ID. Something like this:
Name | A | B | C |
---|---|---|---|
A | - | 1 | 2 |
B | 1 | - | 2 |
C | 2 | 2 | - |
I tried with pandas.pivot_table, but i couldn't understand how
CodePudding user response:
Let us use crosstab
to create frequency table, then calculate the inner product to generate similarity / intersection matrix and mask the diagonal values with 0
s = pd.crosstab(df['Name'], df['id']).clip(upper=1)
s = s @ s.T
np.fill_diagonal(s.values, 0)
print(s)
Name A B C
Name
A 0 1 2
B 1 0 2
C 2 2 0