Home > Software engineering >  Pandas DataFrame filling in a column incorrectly -- Strange behavior
Pandas DataFrame filling in a column incorrectly -- Strange behavior

Time:05-24

I get different results in a for loop from print and df.at. Can this be explained?

import pandas as pd
data = [['A', []], ['B', []], ['C', []], ['D', []]]
df = pd.DataFrame(data, columns = ['Act', 'PreviousActs'])

actssofar = []

for i, row in df.iterrows():
 actssofar.append(row['Act'])
 print (i, actssofar)
 df.at[i,'PreviousActs'] = actssofar

Now, the output of the print function in the for loop is this:

0 ['A']
1 ['A', 'B']
2 ['A', 'B', 'C']
3 ['A', 'B', 'C', 'D']

But the output of the dataframe is this:

Acts PreviousActs
A A, B, C, D
B A, B, C, D
C A, B, C, D
D A, B, C, D

Logically, shouldn't it show the same step-by-step appending behavior as the print function, since we are filling the dataframe with the same value?

CodePudding user response:

If I understand correctly, the problem is that, when the loop finishes, your dataframe contains ['A', 'B', 'C', 'D'] for all rows.This happens because you are passing the list as "reference", which means all rows are storing the same list. You should add a list() call to create a new list everytime you assign it to the dataframe.

import pandas as pd
data = [['A', []], ['B', []], ['C', []], ['D', []]]
df = pd.DataFrame(data, columns = ['Act', 'PreviousActs'])

actssofar = []

for i, row in df.iterrows():
 actssofar.append(row['Act'])
 print (i, actssofar)
 df.at[i,'PreviousActs'] = list(actssofar)

CodePudding user response:

You need to copy the list before putting it in the DataFrame. It's a mutable object, and what you are currently storing in the DataFrame is a reference to the original list, not a copy of it. Every element in the PreviousActs column is the same list.

  • Related