Home > Software design >  Python dataframe loop row by row would not change value no matter what
Python dataframe loop row by row would not change value no matter what

Time:07-18

I am trying to change value of my panda dataframe but it just so stubborn and would not change the value desired. I have used df.at as suggested in some other post and it is not working as a way to change/modify data in dataframe.

HOUSING_PATH = "datasets/housing"
csv_path = os.path.join(HOUSING_PATH, "property6.csv")
housing = pd.read_csv(csv_path)

headers = ['Sold Price', 'Longitude', 'Latitude', 'Land Size', 'Total Bedrooms', 'Total Bathrooms', 'Parking Spaces']
# housing.at[114, headers[6]] = 405 and I want to change this to empty or 0 or None as 405 parking spaces does not make sense. 

for index in housing.index:
# Total parking spaces in this cell 
    row = housing.at[index, headers[6]]
# if total parking spaces is greater than 20
    if row > 20:
# change to nothing 
        row = ''

print(housing.at[114, headers[6]])
# however, this is still 405 

Like why is this happening? Why can't I replace the value of the dataframe? They are<class 'numpy.float64'>, I have checked so the if statement should work and it is working. But just changing the value

CodePudding user response:

You cannot do it like this. Once you assign the value of housing.at[index, headers[6]], you create a new variable which contains this value (row). Then you change the new variable, not the original data.

for index in housing.index:
    # if total parking spaces is greater than 20
    if housing.at[index, headers[6]] > 20:
       # Set the value of original data to empty string
       housing.at[index, headers[6]] = ''

CodePudding user response:

This can be easily done without the use of for loop. Use pd.loc to filter the data frame based on condition and change the values

CODE

import pandas as pd
import os

HOUSING_PATH = "datasets/housing"
csv_path = os.path.join(HOUSING_PATH, "property6.csv")
housing = pd.read_csv(csv_path)

housing.loc[housing["Parking Spaces"] > 20, "Parking Spaces"] = ""

CodePudding user response:

There are several built-in functions to finish such tasks. (where, mask, replace etc.)

# Series.where(cond, other=nan, inplace=False, axis=None, level=None, errors='rais',...)
data2=data.iloc[:,6]
data2.where(data2<=20, '', inplace=True)
  • Related