Assigning matrix elements by indices collected in a panda dataframe-CodePudding

I am trying to construct an affiliation matrix for a social network. I have a pd dataframe where column i is the i index of an element and column j is the j index of an element. Column v is the value of weight between two nodes.

I made up the following table for demonstration. I'll just call it df

i	j	v
1	3	0
2	4	2
5	3	0
2	1	2
1	2	0.5
3	1	1

My idea was to first construct a matrix

A_matrix = np.zeros((i_num, j_num))

Then I use the apply function

df.apply(set_to_matrix)

where

def set_to_matrix(row):
    A_matrix[row.i, row.j] = row.v

My question is, Is it possible to get a better performance?

I have i_num = 100000 and j_num = 1000; with the code above it took me 1 minute 53 sec.

I tried using the swifter package to speed up the apply function, but it turns out to be 2 minutes 23 sec, which is longer.

If possible, also let me know why mine is slower and how other approach can potentially speed up the process.

CodePudding user response：

Your code is not working for me & I didn't spend time to debug it. The following code will give you the matrix you require pretty quickly. The only issue is the duplicate rows (1 & 2) and columns (1& 3) will be combined together (& to me it makes sense!).

df = pd.DataFrame({'i': [1,2,5,2,1,3],
                    'j': [3,4,3,1,2,1],
                    'v': [0,2,0,2,0.5,1]})

df1 = pd.pivot_table(df, values='v',index='i', columns='j', aggfunc=np.mean).reset_index().fillna(0)

Final network matrix:

print(df1.to_numpy())

CodePudding user response：

There is no need to use apply, you can use the i and j columns to index inside the A_matrix then assign the values from v column to the corresponding index positions:

A_matrix = np.zeros((i_num, j_num)) 
A_matrix[df.i, df.j] = df.v