Home > Enterprise >  Conditional replacement of all values in selected columns in dataframe (pandas)
Conditional replacement of all values in selected columns in dataframe (pandas)

Time:03-31

I've got a dataframe with hundreds of columns and millions of rows. I need to conditionally replace the values of selected columns by another value. what is the most efficient way to do this, if I know the index or names of the columns that need to be changed?

example below:

df = pd.DataFrame({'ID1':[0,1,2,3,4,5,6], 'ID2': [0,1,2,0,4,0,5], 'Value1':[0,1,6,0,4,7,0], 'Value2':[1,0,2,3,0,4,5] })

    ID1 ID2 Value1  Value2
0   0    0    0       1
1   1    1    1       0
2   2    2    6       2
3   3    0    0       3
4   4    4    4       0
5   5    0    7       4
6   6    5    0       5

I want the values of Value1,Value2,..., ValueN which are larger than 0 to be replaced by 1. Note that ID1, ID2, ..., IDN should be excluded.

Desired Output:

   ID1  ID2 Value1  Value2
0   0    0    0       1
1   1    1    1       0
2   2    2    1       1      
3   3    0    0       1
4   4    4    1       0
5   5    0    1       1
6   6    5    0       1

dataframe has hundreds of columns and millions of rows.... so I'd like to do this as computationally efficient as possible.

CodePudding user response:

Depending on how many ValueN columns you have, you can first build a list of them:

cols = [x for x in df.columns if 'Value' in x]

An efficient way is using mask:

df[cols] = df[cols].mask(df[cols] > 0, 1)

Alternatively, you can try np.where:

df[cols] = np.where(df[cols] > 1, 0, df[cols])

CodePudding user response:

Or you can try this:

df[ df.iloc[0: ,2:n] >0 ] = 1

n is the maximum value of your column index number 1.

df[ df >0 ] = 1 can check if value in df >0, replace it with 1.

But you want first two column(ID1, ID2) remain the same, so you can use df.iloc[0: ,2:n] extract all row and column[2] to [n],

Reference:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

How to set value of first row of pandas dataframe meeting condition?

  • Related