Home > front end >  How to compare two columns in different pandas dataframes, store the differences in a 3rd dataframe
How to compare two columns in different pandas dataframes, store the differences in a 3rd dataframe

Time:11-10

I need to compare two df1 (blue) and df2 (orange), store only the rows of df2 (orange) that are not in df1 in a separate data frame, and then add that to df1 while assigning function 6 and sector 20 for the employees that were not present in df1 (blue)

example

I know how to find the differences between the data frames and store that in a third data frame, but I'm stuck trying to figure out how to store only the rows of df2 that are not in df1.

CodePudding user response:

Can try this:

  1. Get a list with the data os orange u want to keep
  2. Filter df2 with that list
  3. Append

df1 --> blue, df2 --> orange

import pandas as pd

df2['Function'] = 6
df2['Sector'] = 20

ids_df2_keep = [e for e in df2['ID'] if e not in list(df1['ID'])]
df2 = df2[df2['ID'].isin(ids_df2_keep)
df1 = df1.append(df2)

CodePudding user response:

This has been answered in pandas get rows which are NOT in other dataframe

Store it as a merge and simply select the rows that do not share common values. ~ negates the expression, select all that are NOT IN instead of IN.

common = df1.merge(df2,on=['ID','Name'])
df = df1[(~df1['ID'].isin(common['ID']))&(~df1['Name'].isin(common['Name']))]
  • Related