I have the following dataframe:
import pandas as pd
d = {'col1': [1, 2], 'col2': [{"a": 5}, {"a": 8}]}
df = pd.DataFrame(data=d)
df
Output:
col1 col2
0 1 {'a': 5}
1 2 {'a': 8}
And I have the following code piece (from a bigger program):
for i, r in df.iterrows():
n_df = (r.to_frame().T).copy()
n_df['col2'][i]["a"] = n_df['col2'][i]["a"] - 1
print(n_df)
df
Output:
col1 col2
0 1 {'a': 4}
col1 col2
1 2 {'a': 7}
col1 col2
0 1 {'a': 4}
1 2 {'a': 7}
At the end, I actually expect df remain unchanged because I do use the .copy() function when creating n_df.
Where am I making a mistake here? How should the code above look like so that manipulating n_df will not change the df?
CodePudding user response:
By the definition
When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).
Note that when copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy.
CodePudding user response:
You can't simply use DataFrame.copy
here. See the note at the bottom of the page:
Note that when copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy.
A possible (ugly?) solution:
import pandas as pd
import copy
d = {'col1': [1, 2], 'col2': [{"a": 5}, {"a": 8}]}
df = pd.DataFrame(data=d)
for i, r in df.iterrows():
n_df = pd.DataFrame([copy.deepcopy(r.to_dict())], index=[i])
n_df['col2'][i]["a"] = n_df['col2'][i]["a"] - 1
Output:
>>> df
col1 col2
0 1 {'a': 5}
1 2 {'a': 8}