Home > database >  How to create a new column in a dataframe based on the values of multiple columns in a different dat
How to create a new column in a dataframe based on the values of multiple columns in a different dat

Time:11-29

Let's say I have the two data frames below:


data = {
  'Part' : ['part1', 'part2', 'part3', 'part4', 'part5'],
  'Number' : ['123', '234', '345', '456', '567'],
  'Code' : ['R2', 'R2', 'R4', 'R5', 'R5']
}

df = pd.DataFrame(data, dtype = object)
data2 = {
  'Part' : ['part1', 'part2', 'part6', 'part4'],
  'Number' : ['123', '234', '345', '456'],
  'Code' : ['M2', 'R2', 'R4', 'M5']
}

df2 = pd.DataFrame(data2, dtype = object)

And my goal is to create a new column in df called Old_Code that lists the value of Code from df2 if the Part and Number in df and df2 match. i.e Old_Code would have the following values: ['M2', 'R2', NaN, 'M5', NaN]

I've tried:

def add_code(df):    
    pdf_short.loc[(df['Part'] == df2['Part']) & (df['Number'] == df2['Number']), 'Old_Code'] = df2['Code']
add_code(df)

but I keep getting an error due to the shape of the dataframes not matching. Is there a way to get around this issue?

I've also tried:

def add_code1(df):    
    if (df['Part'] == df2['Part']) & (df['Number'] == df2['Number']):
        return df2['Code']
df['Old_Code'] = df.apply(add_code1, axis = 1)

However, I just get errors.

CodePudding user response:

Here are two ways to do what you've asked:

# First way
df = df.set_index(['Part','Number']).assign(Old_code=df2.set_index(['Part','Number']).Code).reset_index()

# Second way
df = df.merge(df2.rename(columns={'Code':'Old_code'}), how='left', on=['Part','Number'])

Output:

    Part Number Code Old_code
0  part1    123   R2       M2
1  part2    234   R2       R2
2  part3    345   R4      NaN
3  part4    456   R5       M5
4  part5    567   R5      NaN
  • Related