Home > Software engineering >  Replace values with zeros when row and column name equal to each other
Replace values with zeros when row and column name equal to each other

Time:12-15

I have the following data set:

import pandas as pd
import numpy as np

d = {'column1': ['a', 'b', 'c'], 'va1': [10, 8, 6],  'va2': [1, 2, 3], 'vb1': [4, 2, 6], 'vb2': [1, 4, 8], 'vc1': [2, 6, 8], 'vc2': [2, 1, 8] }

data_frame  = pd.DataFrame(data=d)

what I want to do is to replace the values with 0 where column1 and other column values coincide. The desired dataset is following:

d1 = {'column1': ['a', 'b', 'c'], 'va1': [0, 8, 6],  'va2': [0, 2, 3], 'vb1': [4, 0, 6], 'vb2': [1, 0, 8], 'vc1': [2, 6, 0], 'vc2': [2, 1, 0] }

data_frame1  = pd.DataFrame(data=d1)

Because my original dataset is big, I would like to avoid groupby and melt commands. One suggestion would be for instance: to make column1 as index, rename all columns and replace ij element with 0 where the column and index match. Here are the starting lines of what I mean:

data_frame.set_index('column1', inplace=True)
data_frame.columns=data_frame.columns.str[1:2] # Now column and index has the same strings
# Replace ij elements with 0 where index and column matches. 

Any suggestion?

CodePudding user response:

Use numpy broadcasting for compare second value of columns with index values and set 0 in DataFrame.mask:

data_frame.set_index('column1', inplace=True)
cols=data_frame.columns.str[1:2]

data_frame = data_frame.mask(data_frame.index.to_numpy()[:, None] == cols.to_numpy(), 0)
print (data_frame)
         va1  va2  vb1  vb2  vc1  vc2
column1                              
a          0    0    4    1    2    2
b          8    2    0    0    6    1
c          6    3    6    8    0    0
  • Related