Home > Back-end >  Change Value of a Dataframe Column Based on a Filter with specific parameters
Change Value of a Dataframe Column Based on a Filter with specific parameters

Time:11-23

I’m looking at this but I have no idea how to formulate it: Change Value of a Dataframe Column Based on a Filter

I need to change the values in medianIncome with values of 0.4999 or lower to 0.4999 or if 15.0001 or higher to 15.0001.

Here's sample data:

    id  longitude_x latitude    ocean_proximity longitude_y state   medianHouseValue    housingMedianAge    totalBedrooms   totalRooms  households  population  medianIncome
0   1   -122.23 37.88   NEAR BAY    -122.23 CA  452.603 45.0    131.0   884.0   130.0   323.0   83252.0
1   396 -122.34 37.88   NEAR BAY    -122.23 CA  350.004 41.0    930.0   3063.0  926.0   2560.0  17375.0
2   398 -122.29 37.88   NEAR BAY    -122.23 CA  216.703 54.0    263.0   1211.0  230.0   525.0   38672.0
3   401 -122.28 37.88   NEAR BAY    -122.23 CA  261.303 55.0    333.0   1845.0  335.0   772.0   42614.0
4   424 -122.26 37.88   NEAR BAY    -122.23 CA  391.803 53.0    418.0   2553.0  404.0   898.0   62425.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
929044  9476    -123.38 39.37   INLAND  -121.24 CA  124.601 20.0    813.0   3947.0  732.0   1902.0  26424.0
929045  9494    -123.75 39.37   INLAND  -121.24 CA  151.403 20.0    299.0   1377.0  282.0   830.0   32500.0
929046  10065   -121.03 39.37   INLAND  -121.24 CA  85.000  15.0    327.0   1338.0  310.0   1174.0  26341.0
929047  10074   -120.10 39.37   INLAND  -121.24 CA  117.301 34.0    411.0   2328.0  373.0   1016.0  45208.0
929048  21558   -121.24 39.37   INLAND  -121.24 CA  89.401  18.0    616.0   2787.0  532.0   1387.0  23886.0

It shows:

np.where(df['x'] > 0 & df['y'] < 10, 1, 0)

So I'm at:

np.where(housing['medianIncome'] > 15.0001

And I'm stuck as to the rest. Only using pandas and numpy, not able to use lambda.

I'm expecting an outcome that won't give an error. As of yet, I don't have an outcome.

CodePudding user response:

Use Series.clip:

housing = pd.DataFrame({'medianIncome':[20,5,0.07]})

housing['medianIncome'] = housing['medianIncome'].clip(upper=15.0001, lower=0.4999)

print (housing)
   medianIncome
0       15.0001
1        5.0000
2        0.4999

Alternative with numpy.select if need set another values by conditions:

housing['medianIncome'] = np.select([housing['medianIncome'].lt(0.4999),
                                     housing['medianIncome'].gt(15.0001)],
                                     [0,1], 
                                     default=housing['medianIncome'])

print (housing)
   medianIncome
0           1.0
1           5.0
2           0.0
  • Related