Home > Software engineering >  How to do an inverse operation of a merge between two pandas dataframes by id column?
How to do an inverse operation of a merge between two pandas dataframes by id column?

Time:02-10

I am working on a project for university in which I received two dataframes:

      import pandas as pd

      df1 = pd.DataFrame({'id': ['0','1','2','3'],
                          'name': ['sensor', 'actuador', 'sensor', 'sensor'],
                          'type':['analog', 'analog', 'digital', 'analog']})

      print(df1)

      id    name        type
       0    sensor      analog
       1    actuador    analog
       2    sensor      digital
       3    sensor      analog

      df2 = pd.DataFrame({'inst': ['0','10','2','143'],
                          'number': [100, 200, 300, 400]})

     print(df2)

     inst   number
      0      100
     10      200
      2      300
    143      400

I would like to check if the 'id' of df1 is different from the 'inst' of df2. I would only need to return lines from df1 where the 'id' is not contained in the 'inst' of df2.

I thought of doing a merge between the dataframes using the 'id' and 'inst' columns as follows:

      merged_df = df1.merge(df2, left_on=['id'], right_on=['inst'])

The output is:

     print(merged_df)

     id     name      type    inst    number
     0     sensor   analog     0      100
     2     sensor   digital    2      300

However, the desired output is the opposite of this. I would like it to return the df1 with only the lines where the 'id' is not in 'inst'. Also, I would like it to return only the df1 columns.

The desired output is:

       print(df_result)

     id     name    type
      1   actuador  analog
      3   sensor    analog

CodePudding user response:

If there is only one column use boolean indexing with inverted mask by Series.isin:

df_result = df1[~df1['id'].isin(df2['inst'])]
print (df_result)
  id      name    type
1  1  actuador  analog
3  3    sensor  analog

If possible check per multiple columns is possible use alternative solution with indicator parameter and filtering left_only rows:

df_result = (df1.merge(df2, left_on=['id'], right_on=['inst'], how='left', indicator=True)
                .query("_merge == 'left_only'")
                .reindex(df1.columns, axis=1))

print (df_result)
  id      name    type
1  1  actuador  analog
3  3    sensor  analog
  • Related