Home > database >  Pandas python vlookup to filter data from two dataframes and use this filter to replace data from sp
Pandas python vlookup to filter data from two dataframes and use this filter to replace data from sp

Time:07-06

I have two dataframes like below.

d = {'serial#': ['AAA111', 'BBB222', 'CCC333', 'DDD444', 'EEE555'], 
'Status': ['Compatible', 'Compatible', 'Not compatible', 'Not compatible', 'Not compatible']}
df1 = pd.DataFrame(data=d)

d2 = {'serial#': ['DDD444', 'EEE555'], 'Model': ['printer', 'printer']}
df2 = pd.DataFrame(data=d2)

print(df1)
print(df2)

  serial#          Status
0  AAA111      Compatible
1  BBB222      Compatible
2  CCC333  Not compatible
3  DDD444  Not compatible
4  EEE555  Not compatible

  serial#    Model
0  DDD444  printer
1  EEE555  printer

I need to modify the column 'Status' from df1. For this I need to make a filter using the 'serial#' columns of each of the dataframes. The lines under the 'Status on df1' column will be modified if ' there are items that match on df2 'serial#' column.

Suppose I need to replace data in the filtered rows by the string "reporting", the df1 desired is as below.

  serial#          Status
0  AAA111      Compatible
1  BBB222      Compatible
2  CCC333  Not compatible
3  DDD444       Reporting
4  EEE555       Reporting

How can I do that? Any help is welcomed.

CodePudding user response:

Use isin to check serial# from df2 then use it as boolean mask to set Status to 'Reporting':

df1.loc[df1['serial#'].isin(df2['serial#']), 'Status'] = 'Reporting'
print(df1)

# Output
  serial#          Status
0  AAA111      Compatible
1  BBB222      Compatible
2  CCC333  Not compatible
3  DDD444       Reporting
4  EEE555       Reporting

CodePudding user response:

You can perform a left join between both and then a conditional term on Status column:

import pandas as pd
import numpy as np

d = {'serial#': ['AAA111', 'BBB222', 'CCC333', 'DDD444', 'EEE555'], 'Status': ['Compatible', 'Compatible', 'Not compatible', 'Not compatible', 'Not compatible']}
df1 = pd.DataFrame(data=d)

d2 = {'serial#': ['DDD444', 'EEE555'], 'Model': ['printer', 'printer']}
df2 = pd.DataFrame(data=d2)

df3 = df1.merge(df2, on='serial#', how='left')
df3['Status'] = np.where(df3['Model'].notnull(), 'Reporting', df3['Status'])
df3 = df3.drop('Model', axis=1)

df3

df3

  • Related