I have the following data set:
import pandas as pd
import numpy as np
d = {'column1': ['a', 'b', 'c'], 'va1': [10, 8, 6], 'va2': [1, 2, 3], 'vb1': [4, 2, 6], 'vb2': [1, 4, 8], 'vc1': [2, 6, 8], 'vc2': [2, 1, 8] }
data_frame = pd.DataFrame(data=d)
what I want to do is to replace the values with 0 where column1 and other column values coincide. The desired dataset is following:
d1 = {'column1': ['a', 'b', 'c'], 'va1': [0, 8, 6], 'va2': [0, 2, 3], 'vb1': [4, 0, 6], 'vb2': [1, 0, 8], 'vc1': [2, 6, 0], 'vc2': [2, 1, 0] }
data_frame1 = pd.DataFrame(data=d1)
Because my original dataset is big, I would like to avoid groupby and melt commands. One suggestion would be for instance: to make column1 as index, rename all columns and replace ij element with 0 where the column and index match. Here are the starting lines of what I mean:
data_frame.set_index('column1', inplace=True)
data_frame.columns=data_frame.columns.str[1:2] # Now column and index has the same strings
# Replace ij elements with 0 where index and column matches.
Any suggestion?
CodePudding user response:
Use numpy broadcasting for compare second value of columns with index values and set 0
in DataFrame.mask
:
data_frame.set_index('column1', inplace=True)
cols=data_frame.columns.str[1:2]
data_frame = data_frame.mask(data_frame.index.to_numpy()[:, None] == cols.to_numpy(), 0)
print (data_frame)
va1 va2 vb1 vb2 vc1 vc2
column1
a 0 0 4 1 2 2
b 8 2 0 0 6 1
c 6 3 6 8 0 0