Home > database >  Create column in DataFrame1 based on values from DataFrame2
Create column in DataFrame1 based on values from DataFrame2

Time:01-03

I have two Dataframes, and would like to create a new column in DataFrame 1 based on DataFrame 2 values.

But I dont want to join the two dataframes per say and make one big dataframe, but rather use the second Dataframe simply as a look-up.

#Main Dataframe:
df1 = pd.DataFrame({'Size':["Big", "Medium", "Small"], 'Sold_Quantity':[10, 6, 40]})

#Lookup Dataframe
df2 = pd.DataFrame({'Size':["Big", "Medium", "Small"], 'Sold_Quantiy_Score_Mean':[10, 20, 30]})

#Create column in Dataframe 1 based on lookup dataframe values:
df1['New_Column'] = when df1['Size'] = df2['Size'] and df1['Sold_Quantity'] < df2['Sold_Quantiy_Score_Mean'] then 'Below Average Sales' else 'Above Average Sales!' end

CodePudding user response:

One approach, is to use np.where:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Size': ["Big", "Medium", "Small"], 'Sold_Quantity': [10, 6, 40]})
df2 = pd.DataFrame({'Size': ["Big", "Medium", "Small"], 'Sold_Quantiy_Score_Mean': [10, 20, 30]})

condition = (df1['Size'] == df2['Size']) & (df1['Sold_Quantity'] < df2['Sold_Quantiy_Score_Mean'])

df1['New_Column'] = np.where(condition, 'Below Average Sales', 'Above Average Sales!')

print(df1)

Output

     Size  Sold_Quantity            New_Column
0     Big             10  Above Average Sales!
1  Medium              6   Below Average Sales
2   Small             40  Above Average Sales!

CodePudding user response:

Given that df2 is sort of like a lookup based on Size, it would make sense if your Size column was its index:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Size': ["Big", "Medium", "Small"], 'Sold_Quantity': [10, 6, 40]})
df2 = pd.DataFrame({'Size': ["Big", "Medium", "Small"], 'Sold_Quantiy_Score_Mean': [10, 20, 30]})

lookup = df2.set_index("Size")

You can then map the Sizes in df1 to their mean and compare each with the sold quantity:

is_below_mean = df1["Sold_Quantity"] < df1["Size"].map(lookup["Sold_Quantiy_Score_Mean"])

and finally map the boolean values to the respective strings using np.where

df1["New_Column"] = np.where(is_below_mean, 'Below Average Sales', 'Above Average Sales!')

df1:

     Size  Sold_Quantity            New_Column
0     Big             10  Above Average Sales!
1  Medium              6   Below Average Sales
2   Small             40  Above Average Sales!
  • Related