I am trying to construct an affiliation matrix for a social network. I have a pd dataframe where column i
is the i index of an element and column j
is the j index of an element. Column v
is the value of weight between two nodes.
I made up the following table for demonstration. I'll just call it df
i | j | v |
---|---|---|
1 | 3 | 0 |
2 | 4 | 2 |
5 | 3 | 0 |
2 | 1 | 2 |
1 | 2 | 0.5 |
3 | 1 | 1 |
My idea was to first construct a matrix
A_matrix = np.zeros((i_num, j_num))
Then I use the apply function
df.apply(set_to_matrix)
where
def set_to_matrix(row):
A_matrix[row.i, row.j] = row.v
My question is, Is it possible to get a better performance?
I have i_num = 100000 and j_num = 1000; with the code above it took me 1 minute 53 sec.
I tried using the swifter
package to speed up the apply function, but it turns out to be 2 minutes 23 sec, which is longer.
If possible, also let me know why mine is slower and how other approach can potentially speed up the process.
CodePudding user response:
Your code is not working for me & I didn't spend time to debug it. The following code will give you the matrix you require pretty quickly. The only issue is the duplicate rows (1 & 2
) and columns (1& 3
) will be combined together (& to me it makes sense!).
df = pd.DataFrame({'i': [1,2,5,2,1,3],
'j': [3,4,3,1,2,1],
'v': [0,2,0,2,0.5,1]})
df1 = pd.pivot_table(df, values='v',index='i', columns='j', aggfunc=np.mean).reset_index().fillna(0)
Final network matrix:
print(df1.to_numpy())
CodePudding user response:
There is no need to use apply
, you can use the i
and j
columns to index inside the A_matrix
then assign the values from v
column to the corresponding index positions:
A_matrix = np.zeros((i_num, j_num))
A_matrix[df.i, df.j] = df.v