Home > Net >  How to tell my lambda function in Python to Return None instead of throwing up Key Value error
How to tell my lambda function in Python to Return None instead of throwing up Key Value error

Time:10-03

I have 3 pandas DataFrames that describe different aspects of a transaction. I am merging these to "complete the transaction". In essence, I need to add the correct SKU pricing and customer/sku discounts to the correct transaction in df_sales_volume in order to add the financial information to the transaction.

How do I tell my script to Return None, instead of returning a KeyError, when a combination of Customer and SKU exists in df_customer_discounts that does not exist in df_sales_volume?

DataFrames:

df_sales_volume contains 3 columns: Customer, SKU and Units Purchased. This details how many Units of a given SKU and given Customer purchased.

df_sku_prices contains 3 columns: SKU, List Price and Markedup Price. and the SKU pricing for that month that I need to merge into df_sales_volume.

df_customer_discounts contains 3 columns: Customer, SKU and Discount. The dataset contains all discounts the business has in place with every customer, although not all Customer/Discount combinations will be present in df_sales_volume.

Here is the code to create a sample dataset:

import pandas as pd

df_sales_volume = pd.DataFrame({ 
    "Customer": ["John's Fruit Shop"]*2   ["Adam's Grocery's"]*3   ["Lucy's Fresh Food"]*2, 
    "SKU": ["Bannanas"]   ["Apples"]   ["Avocados"]   ["Purple Grapes"]   ["Dragon Fruit"]*2                              ["Mangos"],
    "Units Purchased": [4]   [2]   [13]   [5]   [70]   [34]   [8],
    })

df_sku_prices = pd.DataFrame({
    "SKU": ["Avocados"]   ["Dragon Fruit"]   ["Grapes"]   ["Bannanas"]   ["Apples"],
    "List Price": [103.21]   [4.55]   [42.01]   [7.00]   [3.35], 
    "Markedup Price": [109.34]   [7.20]   [59.00]   [13.78]   [4.10]
}).set_index(["SKU"])


df_customer_discounts = pd.DataFrame({
    "Customer": ["John's Fruit Shop"]*4   ["Adam's Grocery's"]*3   ["Lucy's Fresh Food"]*3, 
    "SKU": ["Apples"]   ["Bannanas"]   ["Purple Grapes"]   ["Mandarins"]   ["Avocados"]   ["Purple Grapes"]   ["Dragon Fruit"]   ["Avocados"]   ["Dragon Fruit"]   ["Mangos"],
    "Discount": [0.05]   [0.35]   [0.22]   [0.15]   [0.50]   [0.40]   [0.10]   [0.75]   [0.01]   [0.24]
}).set_index(["SKU", "Customer"])

This is what I have tried:

# Create copy of original volume file to work with

df_monthly_sales_report = df_sales_volume.copy()

# Lookup list price by sku 

df_monthly_sales_report["SKU List Price"] = df_monthly_sales_report.apply(
lambda row: df_sku_prices.loc[row["SKU"], "List Price"], 
axis=1,
)

# Lookup marked up price by sku

df_monthly_sales_report["SKU Markedup Price"] = df_monthly_sales_report.apply(
lambda row: df_sku_prices.loc[row["SKU"], "Markedup Price"], 
axis=1, 
)

# Lookup discounts by customer and sku

df_monthly_sales_report["Customer Discount"] = df_monthly_sales_report.apply(
lambda row: df_customer_discounts.loc[(row['SKU'], row['Customer']), "Discount"], 
axis=1, 
)

However, I get returned KeyError: 'Purple Grapes'

The desired result is a DataFrame with:

  • A new SKU List Price column the assigns the correct SKU list price to the transaction
  • A new SKU Markedup Price column the assigns the correct SKU markedup price to the transaction
  • A new Discount column that assigns the correct discount to the Customer/SKU combination

Notes about dataset:

  • The real life datasets are far larger
  • The df_customer_discounts will often contain combinations that are not present in the df_sales_volume dataset. In other words, there are Discounts in place on some Customers that are not activated because that Customer did not purchase any of that product. I believe this is what is causing the key error.

I am open to approaches that do not involve lambda, however I am new to Python so my knowledge isn't expansive. This is a script that I will be sharing with colleagues eventually and will be re-run a lot.

CodePudding user response:

I think you're looking for pandas.merge.

Instead of applying a lookup function to each row, you'd simply do a SQL-style left join:

merged = (
    df_sales_volume.merge(
        df_sku_prices, 
        left_on="SKU", 
        right_index=True, 
        how="left"
    ).merge(
        df_customer_discounts, 
        left_on=["SKU", "Customer"], 
        right_index=True, 
        how="left"
    )
)

Result:

            Customer            SKU  Units Purchased  List Price  Markedup Price  Discount
0  John's Fruit Shop       Bannanas                4        7.00           13.78      0.35   
1  John's Fruit Shop         Apples                2        3.35            4.10      0.05   
2   Adam's Grocery's       Avocados               13      103.21          109.34      0.50   
3   Adam's Grocery's  Purple Grapes                5         NaN             NaN      0.40   
4   Adam's Grocery's   Dragon Fruit               70        4.55            7.20      0.10   
5  Lucy's Fresh Food   Dragon Fruit               34        4.55            7.20      0.01   
6  Lucy's Fresh Food         Mangos                8         NaN             NaN      0.24  

CodePudding user response:

It's better to write the code manually..

Try:
    #The Code where you think the error occurs.
Except Exception as e:
    print(e)
    return None
  • Related