I get different results in a for loop from print
and df.at
. Can this be explained?
import pandas as pd
data = [['A', []], ['B', []], ['C', []], ['D', []]]
df = pd.DataFrame(data, columns = ['Act', 'PreviousActs'])
actssofar = []
for i, row in df.iterrows():
actssofar.append(row['Act'])
print (i, actssofar)
df.at[i,'PreviousActs'] = actssofar
Now, the output of the print
function in the for loop is this:
0 ['A']
1 ['A', 'B']
2 ['A', 'B', 'C']
3 ['A', 'B', 'C', 'D']
But the output of the dataframe is this:
Acts | PreviousActs |
---|---|
A | A, B, C, D |
B | A, B, C, D |
C | A, B, C, D |
D | A, B, C, D |
Logically, shouldn't it show the same step-by-step appending behavior as the print function, since we are filling the dataframe with the same value?
CodePudding user response:
If I understand correctly, the problem is that, when the loop finishes, your dataframe contains ['A', 'B', 'C', 'D']
for all rows.This happens because you are passing the list as "reference", which means all rows are storing the same list. You should add a list()
call to create a new list everytime you assign it to the dataframe.
import pandas as pd
data = [['A', []], ['B', []], ['C', []], ['D', []]]
df = pd.DataFrame(data, columns = ['Act', 'PreviousActs'])
actssofar = []
for i, row in df.iterrows():
actssofar.append(row['Act'])
print (i, actssofar)
df.at[i,'PreviousActs'] = list(actssofar)
CodePudding user response:
You need to copy the list before putting it in the DataFrame. It's a mutable object, and what you are currently storing in the DataFrame is a reference to the original list, not a copy of it. Every element in the PreviousActs column is the same list.