I want to move some rows from df1 to df2 when calories in df1 and df2 are the same. The two dfs have the same columns.
import numpy as np
import pandas as pd
np.random.seed(0)
df1 = pd.DataFrame(data = {
"calories": [420, 80, 90, 10],
"duration": [50, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
"calories": [420, 380, 390],
"duration": [60, 40, 45]
})
print(df1)
print(df2)
calories duration
0 420 50
1 80 4
2 90 5
3 10 2
calories duration
0 420 60
1 380 40
2 390 45
rows = df1.loc[df1.calories == df2.calories, :]
df2 = df2.append(rows, ignore_index=True)
df1.drop(rows.index, inplace=True)
print('df1:')
print(df1)
print('df2:')
print(df2)
Then it reports this error:
raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
EDIT: Solution
def move_rows(df1, df2):
for index, row in df1.iterrows():
if row['calories'] in df2.values:
df2 = df2.append(row, ignore_index=True)
df1.drop(index, inplace=True)
return df1, df2
CodePudding user response:
Since your dataframes are not the same length, you need to use merge
to find rows with common calories
values. You need to merge on the index
and calories
values; that can most easily be achieved by using reset_index
to temporarily add an index
column to merge on:
dftemp = df1.reset_index().merge(df2.reset_index(), on=['index', 'calories'], suffixes=['', '_y'])
Output:
index calories duration duration_y
0 0 420 50 60
You can now concat
the calories
and duration
values from dftemp
to df2
(using reset_index
again to reset the index):
df2 = pd.concat([df2, dftemp[['calories', 'duration']]]).reset_index(drop=True)
Output (for your sample data):
calories duration
0 420 60
1 380 40
2 390 45
3 420 50
To remove the rows that were copied to df2
from df1
, we merge just on index, then filter out rows where the two calories
values are different:
dftemp = df1.merge(df2, left_index=True, right_index=True, suffixes=['', '_y']).query('calories != calories_y')
df1 = dftemp[['calories', 'duration']].reset_index(drop=True)
Output (for your sample data):
calories duration
0 80 4
1 90 5
2 10 3