Pandas python vlookup to filter data from two dataframes and use this filter to replace data from sp-CodePudding

I have two dataframes like below.

d = {'serial#': ['AAA111', 'BBB222', 'CCC333', 'DDD444', 'EEE555'], 
'Status': ['Compatible', 'Compatible', 'Not compatible', 'Not compatible', 'Not compatible']}
df1 = pd.DataFrame(data=d)

d2 = {'serial#': ['DDD444', 'EEE555'], 'Model': ['printer', 'printer']}
df2 = pd.DataFrame(data=d2)

print(df1)
print(df2)

  serial#          Status
0  AAA111      Compatible
1  BBB222      Compatible
2  CCC333  Not compatible
3  DDD444  Not compatible
4  EEE555  Not compatible

  serial#    Model
0  DDD444  printer
1  EEE555  printer

I need to modify the column 'Status' from df1. For this I need to make a filter using the 'serial#' columns of each of the dataframes. The lines under the 'Status on df1' column will be modified if ' there are items that match on df2 'serial#' column.

Suppose I need to replace data in the filtered rows by the string "reporting", the df1 desired is as below.

  serial#          Status
0  AAA111      Compatible
1  BBB222      Compatible
2  CCC333  Not compatible
3  DDD444       Reporting
4  EEE555       Reporting

How can I do that? Any help is welcomed.

CodePudding user response：

Use isin to check serial# from df2 then use it as boolean mask to set Status to 'Reporting':

df1.loc[df1['serial#'].isin(df2['serial#']), 'Status'] = 'Reporting'
print(df1)

# Output
  serial#          Status
0  AAA111      Compatible
1  BBB222      Compatible
2  CCC333  Not compatible
3  DDD444       Reporting
4  EEE555       Reporting

CodePudding user response：

You can perform a left join between both and then a conditional term on Status column:

import pandas as pd
import numpy as np

d = {'serial#': ['AAA111', 'BBB222', 'CCC333', 'DDD444', 'EEE555'], 'Status': ['Compatible', 'Compatible', 'Not compatible', 'Not compatible', 'Not compatible']}
df1 = pd.DataFrame(data=d)

d2 = {'serial#': ['DDD444', 'EEE555'], 'Model': ['printer', 'printer']}
df2 = pd.DataFrame(data=d2)

df3 = df1.merge(df2, on='serial#', how='left')
df3['Status'] = np.where(df3['Model'].notnull(), 'Reporting', df3['Status'])
df3 = df3.drop('Model', axis=1)

df3

df3