What's the easiest way in Pandas to check throughout whole 'Product' column if entries include 'Product ID', then use 'Arch ID' to populate 'Product Arch' column?
df1:
Product Product Arch
0 P 1K2 None
1 PC200 None
2 Modem None
3 SSD5AZ None
df2:
Product ID Arch ID
0 1K2 Computer
1 D5A Enterprise
Expected output:
Product Product Arch
0 P 1K2 Computer
1 PC200 None
2 Modem None
3 SSD5AZ Enterprise
Trying different results for a few hours now, with no success. Thanks!
CodePudding user response:
Use a regex crafted from df2
values as key to merge
:
import re
df1['Product Arch'] = df1.merge(df2, left_on=df1['Product'].str.extract(f"({'|'.join(df2['Product ID'].map(re.escape))})", expand=False),
right_on='Product ID', how='left')['Arch ID']
Output:
Product Product Arch
0 P 1K2 Computer
1 PC200 NaN
2 Modem NaN
3 SSD5AZ Enterprise
CodePudding user response:
One way to accomplish this task in Pandas is to use the .apply()
method and a custom function on 'Product' column of df1 to check if it includes 'Product ID' and use the corresponding 'Arch ID' from df2 to populate 'Product Arch' column in df1.
import pandas as pd
df1 = pd.DataFrame({'Product': ['P 1K2', 'PC200', 'Modem', 'SSD5AZ'], 'Product Arch': [None, None, None, None]})
df2 = pd.DataFrame({'Product ID': ['1K2', 'D5A'], 'Arch ID': ['Computer', 'Enterprise']})
def get_arch(product):
for idx, row in df2.iterrows():
if row['Product ID'] in product:
return row['Arch ID']
return None
df1['Product Arch'] = df1['Product'].apply(get_arch)
Output
Product
0 P 1K2
1 PC200
2 Modem
3 SSD5AZ