Home > OS >  Is this the right way to move rows from one dataframe to another with a condtion?
Is this the right way to move rows from one dataframe to another with a condtion?

Time:06-04

I want to move some rows from df1 to df2 when calories in df1 and df2 are the same. The two dfs have the same columns.

import numpy as np
import pandas as pd

np.random.seed(0)
df1 = pd.DataFrame(data = {
  "calories": [420, 80, 90, 10],
  "duration": [50, 4, 5, 3]
})
df2 = pd.DataFrame(data = {
  "calories": [420, 380, 390],
  "duration": [60, 40, 45]
})

print(df1)
print(df2)



calories  duration
0       420        50
1        80         4
2        90         5
3        10         2
   calories  duration
0       420        60
1       380        40
2       390        45

rows = df1.loc[df1.calories == df2.calories, :]
df2 = df2.append(rows, ignore_index=True)
df1.drop(rows.index, inplace=True)

print('df1:')
print(df1)
print('df2:')
print(df2)

Then it reports this error:

raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects

EDIT: Solution

def move_rows(df1, df2):
  for index, row in df1.iterrows():
    if row['calories'] in df2.values:
      df2 = df2.append(row, ignore_index=True)
      df1.drop(index, inplace=True)
  return df1, df2

CodePudding user response:

Since your dataframes are not the same length, you need to use merge to find rows with common calories values. You need to merge on the index and calories values; that can most easily be achieved by using reset_index to temporarily add an index column to merge on:

dftemp = df1.reset_index().merge(df2.reset_index(), on=['index', 'calories'], suffixes=['', '_y'])

Output:

   index  calories  duration  duration_y
0      0       420        50          60

You can now concat the calories and duration values from dftemp to df2 (using reset_index again to reset the index):

df2 = pd.concat([df2, dftemp[['calories', 'duration']]]).reset_index(drop=True)

Output (for your sample data):

   calories  duration
0       420        60
1       380        40
2       390        45
3       420        50

To remove the rows that were copied to df2 from df1, we merge just on index, then filter out rows where the two calories values are different:

dftemp = df1.merge(df2, left_index=True, right_index=True, suffixes=['', '_y']).query('calories != calories_y')
df1 = dftemp[['calories', 'duration']].reset_index(drop=True)

Output (for your sample data):

   calories  duration
0        80         4
1        90         5
2        10         3
  • Related