I need to compare two df1 (blue) and df2 (orange), store only the rows of df2 (orange) that are not in df1 in a separate data frame, and then add that to df1 while assigning function 6 and sector 20 for the employees that were not present in df1 (blue)
I know how to find the differences between the data frames and store that in a third data frame, but I'm stuck trying to figure out how to store only the rows of df2 that are not in df1.
CodePudding user response:
Can try this:
- Get a list with the data os orange u want to keep
- Filter df2 with that list
- Append
df1 --> blue, df2 --> orange
import pandas as pd
df2['Function'] = 6
df2['Sector'] = 20
ids_df2_keep = [e for e in df2['ID'] if e not in list(df1['ID'])]
df2 = df2[df2['ID'].isin(ids_df2_keep)
df1 = df1.append(df2)
CodePudding user response:
This has been answered in pandas get rows which are NOT in other dataframe
Store it as a merge and simply select the rows that do not share common values.
~
negates the expression, select all that are NOT IN instead of IN.
common = df1.merge(df2,on=['ID','Name'])
df = df1[(~df1['ID'].isin(common['ID']))&(~df1['Name'].isin(common['Name']))]