Adding values to an nxn numpy array from a pandas dataframe with the specific indices-CodePudding

The problem is as follows.

I have a pandas dataFrame looking something like

    row_idx clm_idx   value  
0         0       1      a1   
1         0       2      b1     
2         1       3      c1      
3         2       3      d1

This dataFrame can have up to m lines (probably quite a lot).

Secondly I have a nxn numpy array A where I need to add those values to depending on the columns 'row_idx' and 'clm_idx' which describe the row and column index in the numpy array. The mathematical function can even be more complex than simply adding. But I set that aside for now.

A = [[a   b   c  d],
     [e   f   g  h],
     [i   k   l  m],
     [n   o   p  q]]

Here as an example its a 4x4 matrix. So I would like to get the following in the end:

A_new = [[a   b a1   c b1  d],
         [e   f      g     h c1],
         [i   k      l     m d1],
         [n   o      p     q]]

I assume I can iterate over all rows of the dataframe somehow extract the indices and then add the values at the corresponding index to the nxn array. But that seems somewhat inefficient.

I tried as well:

df.apply(lambda x: A[x['row_idx'], x['clm_idx']]  = x['value'])

But this throws a SyntaxError that one cannot contain an assignment in this context.

Is there an efficient way to solve the problem, assuming that there m and n are quite big?

As a basic code block this should do. The problem is the last line:

import numpy as np
import pandas as pd

data = {'row_idx': [0, 0, 1, 2], 'clm_idx': [1,2,3,3], 'value': [1,2,3,4]}

df = pd.DataFrame(data)

A = np.zeros((4,4))

df.apply(lambda x: A[x['row_idx'], x['clm_idx']]  = x['value'], axis=1)

CodePudding user response：

Numpy is able to do indexing directly, you don't need any pandas pre-processing:

A = np.zeros((4, 4))

A[df['row_idx'], df['clm_idx']]  = df['value']

Output:

array([[0., 1., 2., 0.],
       [0., 0., 0., 3.],
       [0., 0., 0., 4.],
       [0., 0., 0., 0.]])

Other example:

A = np.ones((4, 4)) * 100

A[df['row_idx'], df['clm_idx']]  = df['value']

array([[100., 101., 102., 100.],
       [100., 100., 100., 103.],
       [100., 100., 100., 104.],
       [100., 100., 100., 100.]])

CodePudding user response：

example:

data = [[0, 1, 100], [0, 2, 200], [1, 3, 300], [2, 3, 400]]
df = pd.DataFrame(data, columns=['row_idx', 'clm_idx', 'value'])

df

    row_idx clm_idx value
0   0       1       100
1   0       2       200
2   1       3       300
3   2       3       400


a1 = np.arange(0, 16).reshape(4,4)

a1

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

use apply with axis=1

def func(x):
    a1[x['row_idx'], x['clm_idx']]  = x['value']
df.apply(func , axis=1)

output(a1):

array([[  0, 101, 202,   3],
       [  4,   5,   6, 307],
       [  8,   9,  10, 411],
       [ 12,  13,  14,  15]])

CodePudding user response：

First for performant solution never use apply, it is really slow.

Use DataFrame.pivot with DataFrame.reindex for DataFrame with same shape like A and then add values:

data = {'row_idx': [0, 0, 1, 2], 'clm_idx': [1,2,3,3], 'value': [1,2,3,4]}

df = pd.DataFrame(data)

A = np.zeros((4,4))

df = (df.pivot('row_idx','clm_idx','value')
        .reindex(index=range(A.shape[0]), columns=range(A.shape[0])).fillna(0))

#if duplicated  row_idx, clm_idx pairs
#df = (df.pivot_table(index='row_idx',columns='clm_idx',values='value', aggfunc='sum')
#        .reindex(index=range(A.shape[0]), columns=range(A.shape[0])).fillna(0))
print (df)
clm_idx    0    1    2    3
row_idx                    
0        0.0  1.0  2.0  0.0
1        0.0  0.0  0.0  3.0
2        0.0  0.0  0.0  4.0
3        0.0  0.0  0.0  0.0

A = A   df.to_numpy()
print (A)
[[0. 1. 2. 0.]
 [0. 0. 0. 3.]
 [0. 0. 0. 4.]
 [0. 0. 0. 0.]]