I'm new to working with Pandas and I'm trying to do a very simple thing with it. Using the flights.csv
file I'm defining a new column which defines a new column with underperforming
if the number of passengers is below average, the value is 1
. My problem is that it might be something wrong with the logic since it's not updating the values. Here is an example:
df = pd.read_csv('flights.csv')
passengers_mean = df['passengers'].mean()
df['underperforming'] = 0
for idx, row in df.iterrows():
if (row['passengers'] < passengers_mean):
row['underperforming'] = 1
print(df)
print(passengers_mean)
Any clue?
CodePudding user response:
According to the docs:
You should never modify something you are iterating over. This is not guaranteed to work in all cases.
What you can do instead is:
df["underperforming"] = (df.passengers < x.passengers.mean()).astype('int')
CodePudding user response:
Quoting the documentation:
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
Kindly use vectorized operations like apply()