Conditional replacement of all values in selected columns in dataframe (pandas)-CodePudding

I've got a dataframe with hundreds of columns and millions of rows. I need to conditionally replace the values of selected columns by another value. what is the most efficient way to do this, if I know the index or names of the columns that need to be changed?

example below:

df = pd.DataFrame({'ID1':[0,1,2,3,4,5,6], 'ID2': [0,1,2,0,4,0,5], 'Value1':[0,1,6,0,4,7,0], 'Value2':[1,0,2,3,0,4,5] })

    ID1 ID2 Value1  Value2
0   0    0    0       1
1   1    1    1       0
2   2    2    6       2
3   3    0    0       3
4   4    4    4       0
5   5    0    7       4
6   6    5    0       5

I want the values of Value1,Value2,..., ValueN which are larger than 0 to be replaced by 1. Note that ID1, ID2, ..., IDN should be excluded.

Desired Output:

   ID1  ID2 Value1  Value2
0   0    0    0       1
1   1    1    1       0
2   2    2    1       1      
3   3    0    0       1
4   4    4    1       0
5   5    0    1       1
6   6    5    0       1

dataframe has hundreds of columns and millions of rows.... so I'd like to do this as computationally efficient as possible.

CodePudding user response：

Depending on how many ValueN columns you have, you can first build a list of them:

cols = [x for x in df.columns if 'Value' in x]

An efficient way is using mask:

df[cols] = df[cols].mask(df[cols] > 0, 1)

Alternatively, you can try np.where:

df[cols] = np.where(df[cols] > 1, 0, df[cols])

CodePudding user response：

Or you can try this:

df[ df.iloc[0: ,2:n] >0 ] = 1

n is the maximum value of your column index number 1.

df[ df >0 ] = 1 can check if value in df >0, replace it with 1.

But you want first two column(ID1, ID2) remain the same, so you can use df.iloc[0: ,2:n] extract all row and column[2] to [n],

Reference:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

How to set value of first row of pandas dataframe meeting condition?