Is there a way to build a co-occurrence (frequency) matrix for participant-organizer in python?-CodePudding

Let's assume we have a Dataframe that looks like this:

df = pd.DataFrame({'participant_id' : [1608, 1608, 2089, 213, 1608, 1887, 2089, 4544, 6866, 2020, 2020],
               'organizer_id' : [1772, 1772, 1772, 1790, 1790, 1790, 1791, 1791, 1772, 1799, 1799]})

If we print the above, we get:

print(df)



  participant_id   organizer_id
0         1608        1772
1         1608        1772
2         2089        1772
3         213         1790
4         1608        1790
5         1887        1790
6         2089        1791
7         4544        1791
8         6866        1772
9         2020        1799
10        2020        1799

It would be valuable to know how many times did each participant take part in organizer tasks in the form of a co-occurence matrix that would look like this:

    1772  1790  1791  1799  
1608   2.   1.     0.    0 
2089   1.   0.     1.    0
213    0.   1.     0.    0 
1887   0.   1.     0.    0   
4544   0.   0.     1.    0
6866   1.   0.     0.    0
2020   0.   0.     0.    2

How does one build such a matrix in python from the Dataframe, df?

CodePudding user response：

df.groupby(by=["participant_id", "organizer_id"]).size().unstack('organizer_id').fillna(0)

organizer_id    1772  1790  1791  1799
participant_id                        
213              0.0   1.0   0.0   0.0
1608             2.0   1.0   0.0   0.0
1887             0.0   1.0   0.0   0.0
2020             0.0   0.0   0.0   2.0
2089             1.0   0.0   1.0   0.0
4544             0.0   0.0   1.0   0.0
6866             1.0   0.0   0.0   0.0

CodePudding user response：

This is a duplicate with How to create co-occurrence matrix from pandas two column?

Use pd.crosstab(df['participant_id'], df['organizer_id']) to get your output matrix.