Let's say I have the two data frames below:
data = {
'Part' : ['part1', 'part2', 'part3', 'part4', 'part5'],
'Number' : ['123', '234', '345', '456', '567'],
'Code' : ['R2', 'R2', 'R4', 'R5', 'R5']
}
df = pd.DataFrame(data, dtype = object)
data2 = {
'Part' : ['part1', 'part2', 'part6', 'part4'],
'Number' : ['123', '234', '345', '456'],
'Code' : ['M2', 'R2', 'R4', 'M5']
}
df2 = pd.DataFrame(data2, dtype = object)
And my goal is to create a new column in df
called Old_Code
that lists the value of Code
from df2
if the Part
and Number
in df
and df2
match.
i.e Old_Code
would have the following values: ['M2', 'R2', NaN, 'M5', NaN]
I've tried:
def add_code(df):
pdf_short.loc[(df['Part'] == df2['Part']) & (df['Number'] == df2['Number']), 'Old_Code'] = df2['Code']
add_code(df)
but I keep getting an error due to the shape of the dataframes not matching. Is there a way to get around this issue?
I've also tried:
def add_code1(df):
if (df['Part'] == df2['Part']) & (df['Number'] == df2['Number']):
return df2['Code']
df['Old_Code'] = df.apply(add_code1, axis = 1)
However, I just get errors.
CodePudding user response:
Here are two ways to do what you've asked:
# First way
df = df.set_index(['Part','Number']).assign(Old_code=df2.set_index(['Part','Number']).Code).reset_index()
# Second way
df = df.merge(df2.rename(columns={'Code':'Old_code'}), how='left', on=['Part','Number'])
Output:
Part Number Code Old_code
0 part1 123 R2 M2
1 part2 234 R2 R2
2 part3 345 R4 NaN
3 part4 456 R5 M5
4 part5 567 R5 NaN