I have 3 pandas DataFrames that describe different aspects of a transaction. I am merging these to "complete the transaction". In essence, I need to add the correct SKU pricing and customer/sku discounts to the correct transaction in df_sales_volume in order to add the financial information to the transaction.
How do I tell my script to Return None
, instead of returning a KeyError
, when a combination of Customer and SKU exists in df_customer_discounts that does not exist in df_sales_volume?
DataFrames:
df_sales_volume
contains 3 columns: Customer
, SKU
and Units Purchased
. This details how many Units of a given SKU and given Customer purchased.
df_sku_prices
contains 3 columns: SKU
, List Price
and Markedup Price
. and the SKU pricing for that month that I need to merge into df_sales_volume.
df_customer_discounts
contains 3 columns: Customer
, SKU
and Discount
. The dataset contains all discounts the business has in place with every customer, although not all Customer/Discount combinations will be present in df_sales_volume
.
Here is the code to create a sample dataset:
import pandas as pd
df_sales_volume = pd.DataFrame({
"Customer": ["John's Fruit Shop"]*2 ["Adam's Grocery's"]*3 ["Lucy's Fresh Food"]*2,
"SKU": ["Bannanas"] ["Apples"] ["Avocados"] ["Purple Grapes"] ["Dragon Fruit"]*2 ["Mangos"],
"Units Purchased": [4] [2] [13] [5] [70] [34] [8],
})
df_sku_prices = pd.DataFrame({
"SKU": ["Avocados"] ["Dragon Fruit"] ["Grapes"] ["Bannanas"] ["Apples"],
"List Price": [103.21] [4.55] [42.01] [7.00] [3.35],
"Markedup Price": [109.34] [7.20] [59.00] [13.78] [4.10]
}).set_index(["SKU"])
df_customer_discounts = pd.DataFrame({
"Customer": ["John's Fruit Shop"]*4 ["Adam's Grocery's"]*3 ["Lucy's Fresh Food"]*3,
"SKU": ["Apples"] ["Bannanas"] ["Purple Grapes"] ["Mandarins"] ["Avocados"] ["Purple Grapes"] ["Dragon Fruit"] ["Avocados"] ["Dragon Fruit"] ["Mangos"],
"Discount": [0.05] [0.35] [0.22] [0.15] [0.50] [0.40] [0.10] [0.75] [0.01] [0.24]
}).set_index(["SKU", "Customer"])
This is what I have tried:
# Create copy of original volume file to work with
df_monthly_sales_report = df_sales_volume.copy()
# Lookup list price by sku
df_monthly_sales_report["SKU List Price"] = df_monthly_sales_report.apply(
lambda row: df_sku_prices.loc[row["SKU"], "List Price"],
axis=1,
)
# Lookup marked up price by sku
df_monthly_sales_report["SKU Markedup Price"] = df_monthly_sales_report.apply(
lambda row: df_sku_prices.loc[row["SKU"], "Markedup Price"],
axis=1,
)
# Lookup discounts by customer and sku
df_monthly_sales_report["Customer Discount"] = df_monthly_sales_report.apply(
lambda row: df_customer_discounts.loc[(row['SKU'], row['Customer']), "Discount"],
axis=1,
)
However, I get returned KeyError: 'Purple Grapes'
The desired result is a DataFrame with:
- A new SKU List Price column the assigns the correct SKU list price to the transaction
- A new SKU Markedup Price column the assigns the correct SKU markedup price to the transaction
- A new Discount column that assigns the correct discount to the Customer/SKU combination
Notes about dataset:
- The real life datasets are far larger
- The
df_customer_discounts
will often contain combinations that are not present in thedf_sales_volume
dataset. In other words, there are Discounts in place on some Customers that are not activated because that Customer did not purchase any of that product. I believe this is what is causing the key error.
I am open to approaches that do not involve lambda, however I am new to Python so my knowledge isn't expansive. This is a script that I will be sharing with colleagues eventually and will be re-run a lot.
CodePudding user response:
I think you're looking for pandas.merge
.
Instead of applying a lookup function to each row, you'd simply do a SQL-style left join:
merged = (
df_sales_volume.merge(
df_sku_prices,
left_on="SKU",
right_index=True,
how="left"
).merge(
df_customer_discounts,
left_on=["SKU", "Customer"],
right_index=True,
how="left"
)
)
Result:
Customer SKU Units Purchased List Price Markedup Price Discount
0 John's Fruit Shop Bannanas 4 7.00 13.78 0.35
1 John's Fruit Shop Apples 2 3.35 4.10 0.05
2 Adam's Grocery's Avocados 13 103.21 109.34 0.50
3 Adam's Grocery's Purple Grapes 5 NaN NaN 0.40
4 Adam's Grocery's Dragon Fruit 70 4.55 7.20 0.10
5 Lucy's Fresh Food Dragon Fruit 34 4.55 7.20 0.01
6 Lucy's Fresh Food Mangos 8 NaN NaN 0.24
CodePudding user response:
It's better to write the code manually..
Try:
#The Code where you think the error occurs.
Except Exception as e:
print(e)
return None