I have a Pandas Dataframe in the form of a matrix below which represents similarity scores between the elements (people) in the rows and the columns.
| | A | B | C |
|------------|-----------|---- ------|----------|
| D | 0.4 | 0.1 | 0.1 |
| E | 0.2 | 0.1 | 0.4 |
| F | 0.9 | 0.4 | 0.3 |
Further, I have a list of location identifiers for these elements.
A - London
B - Sydney
C - Paris
D - Paris
E - Delhi
F - London
I want to loop through the matrix and make the similarity score equal to 0 if the location is same between the two elements. In this example, I want to replace the intersection of A and F which is 0.9 and the intersection of D and C which is 0.1 with 0 each.
Thanks!
Edit:
The final expected output I am looking for is as below:
| | A | B | C |
|------------|-----------|---- ------|----------|
| D | 0.4 | 0.1 | 0.0 |
| E | 0.2 | 0.1 | 0.4 |
| F | 0.0 | 0.4 | 0.3 |
CodePudding user response:
For match columns names with cities was created dictionary. Then is rename
index and columns and compare with numpy broadcasting, last pass mask to DataFrame.mask
:
d = {'A': 'London', 'B': 'Sydney', 'C': 'Paris', 'D': 'Paris', 'E': 'Delhi', 'F': 'London'}
df1 = df.rename(index=d, columns=d)
df = df.mask(df1.index.to_numpy() == df1.columns.to_numpy()[:, None], 0)
print (df)
A B C
D 0.4 0.1 0.0
E 0.2 0.1 0.4
F 0.0 0.4 0.3
Details:
print (df1)
London Sydney Paris
Paris 0.4 0.1 0.1
Delhi 0.2 0.1 0.4
London 0.9 0.4 0.3
print (df1.index.to_numpy() == df1.columns.to_numpy()[:, None])
[[False False True]
[False False False]
[ True False False]]