Home > Software design >  How to drop part of the values from one column by condition from another column in Python, Pandas?
How to drop part of the values from one column by condition from another column in Python, Pandas?

Time:04-01

I have real estate dataframe with many outliers and many observations. I have variables: total area, number of rooms (if rooms = 0, then it's studio appartment) and kitchen_area.

"Minimalized" extraction from my dataframe:

dic = [{'area': 40, 'kitchen_area': 10, 'rooms': 1, 'price': 50000 },
 {'area': 20, 'kitchen_area': 0, 'rooms': 0, 'price': 50000},
 {'area': 60, 'kitchen_area': 0, 'rooms': 2, 'price': 70000},
 {'area': 29, 'kitchen_area': 9, 'rooms': 1, 'price': 30000},
 {'area': 15, 'kitchen_area': 0, 'rooms': 0, 'price': 25000}]
df = pd.DataFrame(dic, index=['apt1', 'apt2','apt3','apt4', 'apt5'])

My target would be to eliminate apt3, because by law, kitchen area cannot be smaller than 5 squared meters in non-studio apartments. In other words, I would like to eliminate all rows from my dataframe containing the data about apartments which are non-studio (rooms>0), but have kitchen_area <5

I have tried code like this:

df1 = df.drop(df[(df.rooms > 0) & (df.kitchen_area < 5)].index)

But it just eliminated all data from both columns kitchen_area and rooms according to the multiple conditions I put.

CodePudding user response:

Clean

mask1 = df.rooms > 0
mask2 = df.kitchen_area < 5

df1 = df[~(mask1 & mask2)]
df1

      area  kitchen_area  rooms  price
apt1    40            10      1  50000
apt2    20             0      0  50000
apt4    29             9      1  30000
apt5    15             0      0  25000

pd.DataFRame.query

df1 = df.query('rooms == 0 | kitchen_area >= 5')
df1

      area  kitchen_area  rooms  price
apt1    40            10      1  50000
apt2    20             0      0  50000
apt4    29             9      1  30000
apt5    15             0      0  25000
  • Related