Home > database >  Does the copy dataframe equate with source dataframe?
Does the copy dataframe equate with source dataframe?

Time:11-06

  1. create a dataframe
  2. duplicate the dataframe
  3. modify the element of source dataframe
  4. modify the element of copy dataframe
  5. check the two dataframes and their id address

The two dataframes have their idependen address, why they were changed at the same time?

import pandas as pd  

gl_stock_001 = pd.DataFrame({'prc': [3], 'sum': [2], 'dif': [1]})
stock_copy = pd.DataFrame(gl_stock_001)
display(gl_stock_001)
display(stock_copy)
print(id(gl_stock_001), id(stock_copy))

gl_stock_001.loc[0,'sum'] = 777
stock_copy.loc[0,'dif'] = 333
display(gl_stock_001)
display(stock_copy)
print(id(gl_stock_001), id(stock_copy))

enter image description here

CodePudding user response:

This is a common problem with pandas - sometimes it copies data, sometimes it doesn't, and when things copy can change with new releases. The documentation isn't particularly good explaining when this happens and when it doesn't. pandas wraps numpy and consider the case where you have an existing numpy array and all you want is the extra functionality of pandas. You'd want the dataframe to reference the same array that you passed in.

pandas.DataFrame() includes a copy parameter, and you can use that to force a copy when the dataframe is created.

import pandas as pd
display = print
gl_stock_001 = pd.DataFrame({'prc': [3], 'sum': [2], 'dif': [1]})
stock_copy = pd.DataFrame(gl_stock_001, copy=True)
display(gl_stock_001)
display(stock_copy)
print(id(gl_stock_001), id(stock_copy))

gl_stock_001.loc[0,'sum'] = 777
stock_copy.loc[0,'dif'] = 333
display(gl_stock_001)
display(stock_copy)
print(id(gl_stock_001), id(stock_copy))

output showing the columns are now unique

   prc  sum  dif
0    3    2    1
   prc  sum  dif
0    3    2    1
139627957952208 139627957947744
   prc  sum  dif
0    3  777    1
   prc  sum  dif
0    3    2  333
139627957952208 139627957947744
  • Related