Let's assume we have a Dataframe that looks like this:
df = pd.DataFrame({'participant_id' : [1608, 1608, 2089, 213, 1608, 1887, 2089, 4544, 6866, 2020, 2020],
'organizer_id' : [1772, 1772, 1772, 1790, 1790, 1790, 1791, 1791, 1772, 1799, 1799]})
If we print the above, we get:
print(df)
participant_id organizer_id
0 1608 1772
1 1608 1772
2 2089 1772
3 213 1790
4 1608 1790
5 1887 1790
6 2089 1791
7 4544 1791
8 6866 1772
9 2020 1799
10 2020 1799
It would be valuable to know how many times did each participant take part in organizer tasks in the form of a co-occurence matrix that would look like this:
1772 1790 1791 1799
1608 2. 1. 0. 0
2089 1. 0. 1. 0
213 0. 1. 0. 0
1887 0. 1. 0. 0
4544 0. 0. 1. 0
6866 1. 0. 0. 0
2020 0. 0. 0. 2
How does one build such a matrix in python from the Dataframe, df?
CodePudding user response:
df.groupby(by=["participant_id", "organizer_id"]).size().unstack('organizer_id').fillna(0)
organizer_id 1772 1790 1791 1799
participant_id
213 0.0 1.0 0.0 0.0
1608 2.0 1.0 0.0 0.0
1887 0.0 1.0 0.0 0.0
2020 0.0 0.0 0.0 2.0
2089 1.0 0.0 1.0 0.0
4544 0.0 0.0 1.0 0.0
6866 1.0 0.0 0.0 0.0
CodePudding user response:
This is a duplicate with How to create co-occurrence matrix from pandas two column?
Use pd.crosstab(df['participant_id'], df['organizer_id'])
to get your output matrix.