python pandas function df pointer doesn't change values-CodePudding

I'm trying to give a function a pointer to an existing df, and trying to copy values from one df to another. but after the function is finished, the values are not assigned to the original object.

how to recreate:

import pandas as pd


def copy(df, new_df):
    new_df = df.copy()
    df[0] = "test"


if __name__ == '__main__':

    mat = [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]

    df = pd.DataFrame(mat)
    new_df = pd.DataFrame()

    copy(df, new_df)

    print(new_df)

if you notice, in this case i am assigning "test" to the first column, in this case it does assign the values to the original object from the pointed object, but new_df do not get the new values.

is this a bug in pandas? or am i doing something wrong?

edit:

the assigning of values to df[0] is just an example of how the values do change on the original df. my question is, how do i assign the values from the original df to a new df(it could also be concat, not only copy) without having to return the df and create a new variable which receives the returned value from the function

CodePudding user response：

Usually these questions are the other way around ("Why did X change if I modified Y?").

You create a new local variable called new_df inside your copy function, which has nothing to do with the new_df argument that is passed to it.
You are calling df.copy(). It creates a deep copy (ie a totally new dataframe object is created).

If you want changes on df to be reflected through new_df as well, then assign it directly, ie new_df = df. But, why would you want that behavior? Why would you even need new_df if it is just a reference to df?

Pay close attention to these examples:

import pandas as pd


def reference(df):
    new_df = df
    df[0] = "test"
    return new_df

if __name__ == '__main__':

    mat = [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]

    df = pd.DataFrame(mat)

    new_df = reference(df)

    print(new_df)

outputs

      0  1  2
0  test  2  3
1  test  5  6
2  test  8  9

and:

import pandas as pd


def deep_copy(df):
    new_df = df.copy()
    df[0] = "test"
    return new_df


if __name__ == '__main__':

    mat = [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ]

    df = pd.DataFrame(mat)

    new_df = deep_copy(df)

    print(new_df)

outputs

CodePudding user response：

In your case,

the copied df is not returned and only scoped to the function. So the new_df outside the function is never assigned the new values.
The "test" was assigned to the "df" and not "new_df" after the copying is done. That's why the changes will not reflect when you print the "new_df" even if the function is correct.

Try this out.

import pandas as pd

def copy(df, new_df):
    new_df = df.copy()
    new_df[0] = "test"
    return new_df

if __name__ == "__main__":
  mat = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
  ]
  df = pd.DataFrame(mat)
  new_df = pd.DataFrame()
  new_df = copy(df, new_df)
  print(new_df)

output

      0  1  2
0  test  2  3
1  test  5  6
2  test  8  9

Edit

You can totally do it as follows without defining the copy method.

import pandas as pd

if __name__ == "__main__":
  mat = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
  ]
  df = pd.DataFrame(mat)
  new_df = df.copy()
  new_df[0] = "test"
  print(new_df)