I have two dataframes
One is called player and contains name of football players
player= ["David Gonzalez","Agustin Martinez","Jibrail Al-Hindi","Edward Cahill","Simon Becker","Paolo Imperiali","Amir Bahari","Guilherme Souza"]
player = pd.DataFrame(player)
I have another dataframe called football
id | scorer |
---|---|
1 | David Gonzalez, Edward Cahill |
2 | Agustin Martinez,Brian McNamara |
3 | Agustin Martinez, Jibrail Al-Hindi |
4 | Edward Cahill,Guilherme Souza |
5 | Paolo Imperiali, Yannick Wagner |
6 | Simon Becker,Amir Bahari |
7 | Paolo Imperiali,Yannick Wagner |
8 | Amir Bahari,Guilherme Souza,David Gonzalez |
9 | Edward Cahill,Amir Bahari |
10 | Simon Becker |
11 | Amir Bahari |
12 | Paolo Imperiali,Simon Becker |
13 | Edward Cahill,Guilherme Souza |
14 | Edward Cahill,Amir Bahari |
15 | Simon Becker |
16 | Simon Becker |
the second dataframe called football shows, which players scored in which game.
Now I would like to create a adjacency matrix, which shows rows and columns of all players from dataframe player, with 1 if there is a game id were both have scored together, and 0 if they don't have a game which they scored together.
I did this.
np.zeros((player,scorer)
But I think I am in the wrong path, because I want a matrix which the columns and rows give the names of the player in player and have 1 or 0 as numbers
CodePudding user response:
You can split
/explode
and join
the players for a crosstab
:
s = football['scorer'].str.split(',\s*').explode().loc[lambda s: s.isin(player[0])]
df2 = s.rename('row').to_frame().join(s.rename('col'))
out = pd.crosstab(df2['row'], df2['col']).rename_axis(index=None, columns=None)
NB. you get the number of goals in common, if you just want 0/1, add .clip(upper=1)
.
Output:
Agustin Martinez Amir Bahari David Gonzalez Edward Cahill Guilherme Souza Jibrail Al-Hindi Paolo Imperiali Simon Becker
Agustin Martinez 2 0 0 0 0 1 0 0
Amir Bahari 0 5 1 2 1 0 0 1
David Gonzalez 0 1 2 1 1 0 0 0
Edward Cahill 0 2 1 5 2 0 0 0
Guilherme Souza 0 1 1 2 3 0 0 0
Jibrail Al-Hindi 1 0 0 0 0 1 0 0
Paolo Imperiali 0 0 0 0 0 0 3 1
Simon Becker 0 1 0 0 0 0 1 5