For instance, I have 20 columns in dataset, but replace of negative values is required only for two columns. How do I do? For instance, for ArrDelay and DepDelay.
Dataset:
FlightNum ArrDelay DepDelay TailNum Month Dest
3232 3 -6 432G 1 ORX
4342 -2 4 476N 2 TOL
6344 -1 -4 643G 3 JFK
7564 5 13 653A 4 CVO
The result should be:
FlightNum ArrDelay DepDelay TailNum Month Dest
3232 3 0 432G 1 ORX
4342 0 4 476N 2 TOL
6344 0 0 643G 3 JFK
7564 5 13 653A 4 CVO
I can't use general 'mask' for all values because there are some columns with 'string' values and then errors are shown. But if I use 'mask' only for ArrDelay or DepDelay, everything works good. But it's required to remove negative values simultaneously for both columns.
My code:
data = data.mask(data.ArrDelay.lt(0),0) #how to add 'DepDelay'?
data
CodePudding user response:
data = data.assign(
ArrDelay=np.where(data["ArrDelay"].lt(0), 0, data["ArrDelay"]),
DepDelay=np.where(data["DepDelay"].lt(0), 0, data["DepDelay"])
)
print(data)
FlightNum ArrDelay DepDelay TailNum Month Dest
0 3232 3 0 432G 1 ORX
1 4342 0 4 476N 2 TOL
2 6344 0 0 643G 3 JFK
3 7564 5 13 653A 4 CVO
CodePudding user response:
You can select the columns to update, clip
the values lower than 0, and update
the DataFrame in place.
cols = ['ArrDelay', 'DepDelay']
df.update(df[cols].clip(lower=0))
Alternative if you prefer a mask:
df.update(df[cols].mask(df[cols].lt(0), 0))
Output:
FlightNum ArrDelay DepDelay TailNum Month Dest
0 3232 3 0 432G 1 ORX
1 4342 0 4 476N 2 TOL
2 6344 0 0 643G 3 JFK
3 7564 5 13 653A 4 CVO