I'm trying to give a function a pointer to an existing df, and trying to copy values from one df to another. but after the function is finished, the values are not assigned to the original object.
how to recreate:
import pandas as pd
def copy(df, new_df):
new_df = df.copy()
df[0] = "test"
if __name__ == '__main__':
mat = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
df = pd.DataFrame(mat)
new_df = pd.DataFrame()
copy(df, new_df)
print(new_df)
if you notice, in this case i am assigning "test" to the first column, in this case it does assign the values to the original object from the pointed object, but new_df do not get the new values.
is this a bug in pandas? or am i doing something wrong?
edit:
the assigning of values to df[0]
is just an example of how the values do change on the original df
.
my question is, how do i assign the values from the original df to a new df(it could also be concat, not only copy) without having to return the df and create a new variable which receives the returned value from the function
CodePudding user response:
Usually these questions are the other way around ("Why did X
change if I modified Y
?").
You create a new local variable called
new_df
inside yourcopy
function, which has nothing to do with thenew_df
argument that is passed to it.You are calling
df.copy()
. It creates a deep copy (ie a totally new dataframe object is created).If you want changes on
df
to be reflected throughnew_df
as well, then assign it directly, ienew_df = df
. But, why would you want that behavior? Why would you even neednew_df
if it is just a reference todf
?
Pay close attention to these examples:
import pandas as pd
def reference(df):
new_df = df
df[0] = "test"
return new_df
if __name__ == '__main__':
mat = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
df = pd.DataFrame(mat)
new_df = reference(df)
print(new_df)
outputs
0 1 2
0 test 2 3
1 test 5 6
2 test 8 9
and:
import pandas as pd
def deep_copy(df):
new_df = df.copy()
df[0] = "test"
return new_df
if __name__ == '__main__':
mat = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
df = pd.DataFrame(mat)
new_df = deep_copy(df)
print(new_df)
outputs
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
CodePudding user response:
In your case,
- the copied df is not returned and only scoped to the function. So the new_df outside the function is never assigned the new values.
- The "test" was assigned to the "df" and not "new_df" after the copying is done. That's why the changes will not reflect when you print the "new_df" even if the function is correct.
Try this out.
import pandas as pd
def copy(df, new_df):
new_df = df.copy()
new_df[0] = "test"
return new_df
if __name__ == "__main__":
mat = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
df = pd.DataFrame(mat)
new_df = pd.DataFrame()
new_df = copy(df, new_df)
print(new_df)
output
0 1 2
0 test 2 3
1 test 5 6
2 test 8 9
Edit
You can totally do it as follows without defining the copy method.
import pandas as pd
if __name__ == "__main__":
mat = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
df = pd.DataFrame(mat)
new_df = df.copy()
new_df[0] = "test"
print(new_df)