Cutting a DataFrame with pandas.iloc doesn't change its copied variable? (Copied with "=&q-CodePudding

I have a DataFrame df :

s = pd.Series([3, 4, 5], ['earth', 'mars', 'jupiter'])
k = pd.Series([1, 2, 3], ['earth', 'mars', 'jupiter'])
df = pd.DataFrame({'mass': s, 'diameter': k})

         mass  diameter
earth       3         1
mars        4         2
jupiter     5         3

I copy df to a new variable df2 using:

df2 = df

The following change will apply to both dfand df2. This is understandable.

df["mass"]["earth"] = 100

But this next change will only apply one DataFrame:

df = df.iloc[:2,:]

Can someone explain this behavior? Thanks a lot.

CodePudding user response：

When you do:

df2 = df

both df2 and df point to the same object.

If you were to modify one in place (df.loc["earth", "mass"] = 100), the "other one" (actually same) would be modified.

However when you do:

df = df.iloc[:2,:]

df2 still points to the original dataframe, while df now points to a new object (the df.iloc[:2,:] slice).

demo:

df2 = df
id(df) == id(df2)
# True               # same object

df = df.iloc[:2,:]
id(df) == id(df2)
# False              # no longer the same object

id(df.iloc[:2,:]) == id(df.iloc[:2,:])
# False              # each call creates a new copy

CodePudding user response：

df2 = df.copy()

use this make a separate copy, otherwise they both point to the same data in memory