I have a DataFrame df
:
s = pd.Series([3, 4, 5], ['earth', 'mars', 'jupiter'])
k = pd.Series([1, 2, 3], ['earth', 'mars', 'jupiter'])
df = pd.DataFrame({'mass': s, 'diameter': k})
mass diameter
earth 3 1
mars 4 2
jupiter 5 3
I copy df
to a new variable df2
using:
df2 = df
The following change will apply to both df
and df2
. This is understandable.
df["mass"]["earth"] = 100
But this next change will only apply one DataFrame:
df = df.iloc[:2,:]
Can someone explain this behavior? Thanks a lot.
CodePudding user response:
When you do:
df2 = df
both df2 and df point to the same object.
If you were to modify one in place (df.loc["earth", "mass"] = 100
), the "other one" (actually same) would be modified.
However when you do:
df = df.iloc[:2,:]
df2
still points to the original dataframe, while df
now points to a new object (the df.iloc[:2,:]
slice).
demo:
df2 = df
id(df) == id(df2)
# True # same object
df = df.iloc[:2,:]
id(df) == id(df2)
# False # no longer the same object
id(df.iloc[:2,:]) == id(df.iloc[:2,:])
# False # each call creates a new copy
CodePudding user response:
df2 = df.copy()
use this make a separate copy, otherwise they both point to the same data in memory