I am trying to change value of my panda dataframe but it just so stubborn and would not change the value desired. I have used df.at
as suggested in some other post and it is not working as a way to change/modify data in dataframe.
HOUSING_PATH = "datasets/housing"
csv_path = os.path.join(HOUSING_PATH, "property6.csv")
housing = pd.read_csv(csv_path)
headers = ['Sold Price', 'Longitude', 'Latitude', 'Land Size', 'Total Bedrooms', 'Total Bathrooms', 'Parking Spaces']
# housing.at[114, headers[6]] = 405 and I want to change this to empty or 0 or None as 405 parking spaces does not make sense.
for index in housing.index:
# Total parking spaces in this cell
row = housing.at[index, headers[6]]
# if total parking spaces is greater than 20
if row > 20:
# change to nothing
row = ''
print(housing.at[114, headers[6]])
# however, this is still 405
Like why is this happening? Why can't I replace the value of the dataframe? They are<class 'numpy.float64'>
, I have checked so the if statement should work and it is working. But just changing the value
CodePudding user response:
You cannot do it like this. Once you assign the value of housing.at[index, headers[6]]
, you create a new variable which contains this value (row
). Then you change the new variable, not the original data.
for index in housing.index:
# if total parking spaces is greater than 20
if housing.at[index, headers[6]] > 20:
# Set the value of original data to empty string
housing.at[index, headers[6]] = ''
CodePudding user response:
This can be easily done without the use of for
loop. Use pd.loc
to filter the data frame based on condition and change the values
CODE
import pandas as pd
import os
HOUSING_PATH = "datasets/housing"
csv_path = os.path.join(HOUSING_PATH, "property6.csv")
housing = pd.read_csv(csv_path)
housing.loc[housing["Parking Spaces"] > 20, "Parking Spaces"] = ""
CodePudding user response:
There are several built-in functions to finish such tasks. (where, mask, replace etc.)
# Series.where(cond, other=nan, inplace=False, axis=None, level=None, errors='rais',...)
data2=data.iloc[:,6]
data2.where(data2<=20, '', inplace=True)