Home > Mobile >  How to use 'mask' in Pandas for multiple columns?
How to use 'mask' in Pandas for multiple columns?

Time:10-09

For instance, I have 20 columns in dataset, but replace of negative values is required only for two columns. How do I do? For instance, for ArrDelay and DepDelay.

Dataset:

FlightNum ArrDelay DepDelay TailNum Month Dest
3232 3 -6 432G 1 ORX
4342 -2 4 476N 2 TOL
6344 -1 -4 643G 3 JFK
7564 5 13 653A 4 CVO

The result should be:

FlightNum ArrDelay DepDelay TailNum Month Dest
3232 3 0 432G 1 ORX
4342 0 4 476N 2 TOL
6344 0 0 643G 3 JFK
7564 5 13 653A 4 CVO

I can't use general 'mask' for all values because there are some columns with 'string' values and then errors are shown. But if I use 'mask' only for ArrDelay or DepDelay, everything works good. But it's required to remove negative values simultaneously for both columns.

My code:

data = data.mask(data.ArrDelay.lt(0),0) #how to add 'DepDelay'?
data

CodePudding user response:

Use .assign and np.where

data = data.assign(
    ArrDelay=np.where(data["ArrDelay"].lt(0), 0, data["ArrDelay"]),
    DepDelay=np.where(data["DepDelay"].lt(0), 0, data["DepDelay"])
)
print(data)

   FlightNum  ArrDelay  DepDelay TailNum  Month Dest
0       3232         3         0    432G      1  ORX
1       4342         0         4    476N      2  TOL
2       6344         0         0    643G      3  JFK
3       7564         5        13    653A      4  CVO

CodePudding user response:

You can select the columns to update, clip the values lower than 0, and update the DataFrame in place.

cols = ['ArrDelay', 'DepDelay']
df.update(df[cols].clip(lower=0))

Alternative if you prefer a mask:

df.update(df[cols].mask(df[cols].lt(0), 0))

Output:

   FlightNum  ArrDelay  DepDelay TailNum  Month Dest
0       3232         3         0    432G      1  ORX
1       4342         0         4    476N      2  TOL
2       6344         0         0    643G      3  JFK
3       7564         5        13    653A      4  CVO
  • Related