So I have have written a short function that basically moves a single row, based on the index, from one dataframe to another, while preserving the index.
If I have this test dataframe and an empty one:
df = pd.DataFrame({'lower': ['a','b','c'],
'upper': ['A', 'B', 'C'],
'number': [1, 2, 3]},
index=['first', 'second', 'third'])
print(df, '\n\n')
empty = pd.DataFrame(columns=['lower', 'upper', 'number'])
print(empty, '\n\n')
and I just use the instructions:
line = 'second'
empty = empty.append(df.loc[line])
df = df.drop(index=line)
it works.
But if I try to write an impure function that does the same thing, it only modifies the dataframes inside the function, and outside it they remain unchanged!?
Here is my entire code:
def move_line(ind, source, destination):
row = source.loc[ind]
destination = destination.append(row)
source = source.drop(index=ind)
print('source inside function\n', source, '\n\n')
print('destination inside function\n', destination, '\n\n')
def main():
df = pd.DataFrame({'lower': ['a','b','c'],
'upper': ['A', 'B', 'C'],
'number': [1, 2, 3]},
index=['first', 'second', 'third'])
#print(df, '\n\n')
empty = pd.DataFrame(columns=['lower', 'upper', 'number'])
#print(empty, '\n\n')
move_line('second', df, empty)
print('source outside function\n', df, '\n\n')
print('destination outside function\n', empty)
CodePudding user response:
it only modifies the dataframes inside the function, and outside it they remain unchanged!?
That is because DataFrame.append
doesn't mutate the original DataFrame, it creates a new DataFrame with the new row. The original object is left unchanged. DataFrame.drop
by default also doesn't change the original object, unless you pass inplace=True
.
destination = destination.append(row) source = source.drop(index=ind)
Here you are only rebinding the names destination
and source
to the objects returned by append
and drop
, they are not the same original objects which destination
and source
originally pointed to. The original objects remain unchanged.
To mutate the original objects you can do the following
def move_line(ind, source, destination):
row = source.loc[ind]
destination.loc[ind] = row
source.drop(index=ind, inplace=True)
print('source inside function\n', source, '\n\n')
print('destination inside function\n', destination, '\n\n')
df = pd.DataFrame({'lower': ['a','b','c'],
'upper': ['A', 'B', 'C'],
'number': [1, 2, 3]},
index=['first', 'second', 'third'])
#print(df, '\n\n')
empty = pd.DataFrame(columns=['lower', 'upper', 'number'])
#print(empty, '\n\n')
move_line('second', df, empty)
print('source outside function\n', df, '\n\n')
print('destination outside function\n', empty)
Output:
source inside function
lower upper number
first a A 1
third c C 3
destination inside function
lower upper number
second b B 2
source outside function
lower upper number
first a A 1
third c C 3
destination outside function
lower upper number
second b B 2