How to tell my lambda function in Python to Return None instead of throwing up Key Value error-CodePudding

I have 3 pandas DataFrames that describe different aspects of a transaction. I am merging these to "complete the transaction". In essence, I need to add the correct SKU pricing and customer/sku discounts to the correct transaction in df_sales_volume in order to add the financial information to the transaction.

How do I tell my script to Return None, instead of returning a KeyError, when a combination of Customer and SKU exists in df_customer_discounts that does not exist in df_sales_volume?

DataFrames:

df_sales_volume contains 3 columns: Customer, SKU and Units Purchased. This details how many Units of a given SKU and given Customer purchased.

df_sku_prices contains 3 columns: SKU, List Price and Markedup Price. and the SKU pricing for that month that I need to merge into df_sales_volume.

df_customer_discounts contains 3 columns: Customer, SKU and Discount. The dataset contains all discounts the business has in place with every customer, although not all Customer/Discount combinations will be present in df_sales_volume.

Here is the code to create a sample dataset:

import pandas as pd

df_sales_volume = pd.DataFrame({ 
    "Customer": ["John's Fruit Shop"]*2   ["Adam's Grocery's"]*3   ["Lucy's Fresh Food"]*2, 
    "SKU": ["Bannanas"]   ["Apples"]   ["Avocados"]   ["Purple Grapes"]   ["Dragon Fruit"]*2                              ["Mangos"],
    "Units Purchased": [4]   [2]   [13]   [5]   [70]   [34]   [8],
    })

df_sku_prices = pd.DataFrame({
    "SKU": ["Avocados"]   ["Dragon Fruit"]   ["Grapes"]   ["Bannanas"]   ["Apples"],
    "List Price": [103.21]   [4.55]   [42.01]   [7.00]   [3.35], 
    "Markedup Price": [109.34]   [7.20]   [59.00]   [13.78]   [4.10]
}).set_index(["SKU"])


df_customer_discounts = pd.DataFrame({
    "Customer": ["John's Fruit Shop"]*4   ["Adam's Grocery's"]*3   ["Lucy's Fresh Food"]*3, 
    "SKU": ["Apples"]   ["Bannanas"]   ["Purple Grapes"]   ["Mandarins"]   ["Avocados"]   ["Purple Grapes"]   ["Dragon Fruit"]   ["Avocados"]   ["Dragon Fruit"]   ["Mangos"],
    "Discount": [0.05]   [0.35]   [0.22]   [0.15]   [0.50]   [0.40]   [0.10]   [0.75]   [0.01]   [0.24]
}).set_index(["SKU", "Customer"])

This is what I have tried:

# Create copy of original volume file to work with

df_monthly_sales_report = df_sales_volume.copy()

# Lookup list price by sku 

df_monthly_sales_report["SKU List Price"] = df_monthly_sales_report.apply(
lambda row: df_sku_prices.loc[row["SKU"], "List Price"], 
axis=1,
)

# Lookup marked up price by sku

df_monthly_sales_report["SKU Markedup Price"] = df_monthly_sales_report.apply(
lambda row: df_sku_prices.loc[row["SKU"], "Markedup Price"], 
axis=1, 
)

# Lookup discounts by customer and sku

df_monthly_sales_report["Customer Discount"] = df_monthly_sales_report.apply(
lambda row: df_customer_discounts.loc[(row['SKU'], row['Customer']), "Discount"], 
axis=1, 
)

However, I get returned KeyError: 'Purple Grapes'

The desired result is a DataFrame with:

A new SKU List Price column the assigns the correct SKU list price to the transaction
A new SKU Markedup Price column the assigns the correct SKU markedup price to the transaction
A new Discount column that assigns the correct discount to the Customer/SKU combination

Notes about dataset:

The real life datasets are far larger
The df_customer_discounts will often contain combinations that are not present in the df_sales_volume dataset. In other words, there are Discounts in place on some Customers that are not activated because that Customer did not purchase any of that product. I believe this is what is causing the key error.

I am open to approaches that do not involve lambda, however I am new to Python so my knowledge isn't expansive. This is a script that I will be sharing with colleagues eventually and will be re-run a lot.

CodePudding user response：

I think you're looking for pandas.merge.

Instead of applying a lookup function to each row, you'd simply do a SQL-style left join:

merged = (
    df_sales_volume.merge(
        df_sku_prices, 
        left_on="SKU", 
        right_index=True, 
        how="left"
    ).merge(
        df_customer_discounts, 
        left_on=["SKU", "Customer"], 
        right_index=True, 
        how="left"
    )
)

Result:

            Customer            SKU  Units Purchased  List Price  Markedup Price  Discount
0  John's Fruit Shop       Bannanas                4        7.00           13.78      0.35   
1  John's Fruit Shop         Apples                2        3.35            4.10      0.05   
2   Adam's Grocery's       Avocados               13      103.21          109.34      0.50   
3   Adam's Grocery's  Purple Grapes                5         NaN             NaN      0.40   
4   Adam's Grocery's   Dragon Fruit               70        4.55            7.20      0.10   
5  Lucy's Fresh Food   Dragon Fruit               34        4.55            7.20      0.01   
6  Lucy's Fresh Food         Mangos                8         NaN             NaN      0.24

CodePudding user response：

It's better to write the code manually..

Try:
    #The Code where you think the error occurs.
Except Exception as e:
    print(e)
    return None