Updates to Python pandas dataframe rows do not update the dataframe?-CodePudding

I just discovered that iterating the rows of a pandas dataframe, and making updates to each row, does not update the dataframe! Is this expected behaviour, or does one need to do something to the row first so the update reflects in the parent dataframe?

(I know one could update the dataframe directly in the loop, my question is about the fact that iterrows() seems to provide copies of the rows rather than references to the actual rows in the dataframe, which seems an odd way to do this).

import pandas as pd

fruit = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color": ['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}

df = pd.DataFrame(fruit)

for index, row in df.iterrows():
  row['Price'] = row['Price'] * 2
  print(row['Price']) # the price is doubled here as expected

print(df['Price']) # the original values of price in the dataframe are unchanged

CodePudding user response：

You are storing the changes as row['Price'] but not actually saving it back to the dataframe df, you can go ahead and test this by using:

id(row) == id(df)

Which returns False. Also, for better efficiency you shouldn't loop, but rather simply re-assign. Replace the for loop with:

df['New Price '] = df['Price'] * 2

CodePudding user response：

You are entering the subtleties of copies versus original object. What you update in the loop is a copy of the row, not the original Series.

You should have used a direct access to the DataFrame:

for index, row in df.iterrows():
  df.loc[index, 'Price'] = row['Price'] * 2

But the real way to perform such operations should be a vectorial one:

df['Price'] = df['Price'].mul(2)

Or:

df['Price'] *= 2

Output:

        Fruit   Color  Price
0       Apple     Red     90
1     Avacado   Green    180
2      Banana  Yellow    120
3  Strawberry    Pink     74
4       Grape   Green     98