Populating Pandas column with the use of reference values in Python-CodePudding

What's the easiest way in Pandas to check throughout whole 'Product' column if entries include 'Product ID', then use 'Arch ID' to populate 'Product Arch' column?

df1:

  Product Product Arch
0   P 1K2         None
1   PC200         None
2   Modem         None
3  SSD5AZ         None

df2:

  Product ID     Arch ID
0        1K2    Computer
1        D5A  Enterprise

Expected output:

  Product Product Arch
0   P 1K2         Computer
1   PC200         None
2   Modem         None
3  SSD5AZ         Enterprise

Trying different results for a few hours now, with no success. Thanks!

CodePudding user response：

Use a regex crafted from df2 values as key to merge:

import re

df1['Product Arch'] = df1.merge(df2, left_on=df1['Product'].str.extract(f"({'|'.join(df2['Product ID'].map(re.escape))})", expand=False),
                                right_on='Product ID', how='left')['Arch ID']

Output:

  Product Product Arch
0   P 1K2     Computer
1   PC200          NaN
2   Modem          NaN
3  SSD5AZ   Enterprise

CodePudding user response：

One way to accomplish this task in Pandas is to use the .apply() method and a custom function on 'Product' column of df1 to check if it includes 'Product ID' and use the corresponding 'Arch ID' from df2 to populate 'Product Arch' column in df1.

import pandas as pd

df1 = pd.DataFrame({'Product': ['P 1K2', 'PC200', 'Modem', 'SSD5AZ'], 'Product Arch': [None, None, None, None]})
df2 = pd.DataFrame({'Product ID': ['1K2', 'D5A'], 'Arch ID': ['Computer', 'Enterprise']})

def get_arch(product):
    for idx, row in df2.iterrows():
        if row['Product ID'] in product:
            return row['Arch ID']
    return None

df1['Product Arch'] = df1['Product'].apply(get_arch)

Output

  Product
0   P 1K2
1   PC200
2   Modem
3  SSD5AZ