Home > Software design >  Column name change of a deep copy of a pandas dataframe changes the column name of the original data
Column name change of a deep copy of a pandas dataframe changes the column name of the original data

Time:08-31

My understanding is that if something is changed in a deep copy of a pandas dataframe, that shouldn't affect the original dataframe from which the copy was made.

But when the below code is run, it can be seen that the column name of df1 also changes. But I want only the column name in df2 to change.

df1 = pd.DataFrame({'col1':[1,2,3,4],'col2':[2,3,4,3],'col3':[4,3,1,5]})
df2 = df1.copy()
# Change the column name using column index rather than the rename function
df2.columns.values[1] = 'new_col_name'

I know we can use the rename function but I have a requirement where I need to rename using the column index. Can you explain what's happening above and how can I avoid this issue ?

CodePudding user response:

See the final section of the pd.copy() docs:

Note that when copying an object containing Python objects, a deep copy will copy the data, but will not do so recursively. Updating a nested data object will be reflected in the deep copy.

The columns object of the DataFrame is not included in a deep copy.

One workaround is to use copy.deepcopy

import copy
df2 = pd.DataFrame(columns = copy.deepcopy(df1.columns), data = copy.deepcopy(df1.values))
  • Related