Home > OS >  Assigning matrix elements by indices collected in a panda dataframe
Assigning matrix elements by indices collected in a panda dataframe

Time:06-18

I am trying to construct an affiliation matrix for a social network. I have a pd dataframe where column i is the i index of an element and column j is the j index of an element. Column v is the value of weight between two nodes.

I made up the following table for demonstration. I'll just call it df

i j v
1 3 0
2 4 2
5 3 0
2 1 2
1 2 0.5
3 1 1

My idea was to first construct a matrix

A_matrix = np.zeros((i_num, j_num)) 

Then I use the apply function

df.apply(set_to_matrix)

where

def set_to_matrix(row):
    A_matrix[row.i, row.j] = row.v

My question is, Is it possible to get a better performance?

I have i_num = 100000 and j_num = 1000; with the code above it took me 1 minute 53 sec.

I tried using the swifter package to speed up the apply function, but it turns out to be 2 minutes 23 sec, which is longer.

If possible, also let me know why mine is slower and how other approach can potentially speed up the process.

CodePudding user response:

Your code is not working for me & I didn't spend time to debug it. The following code will give you the matrix you require pretty quickly. The only issue is the duplicate rows (1 & 2) and columns (1& 3) will be combined together (& to me it makes sense!).

df = pd.DataFrame({'i': [1,2,5,2,1,3],
                    'j': [3,4,3,1,2,1],
                    'v': [0,2,0,2,0.5,1]})

df1 = pd.pivot_table(df, values='v',index='i', columns='j', aggfunc=np.mean).reset_index().fillna(0)

Final network matrix:

print(df1.to_numpy())

CodePudding user response:

There is no need to use apply, you can use the i and j columns to index inside the A_matrix then assign the values from v column to the corresponding index positions:

A_matrix = np.zeros((i_num, j_num)) 
A_matrix[df.i, df.j] = df.v
  • Related