Background
I have two dataframes, one contains some unrelated columns and forecast sales volume (x1, x2, ...)
date ColA Colb x1 x2 x3 …
1/01/2022 Apple Cat 703 750 273 738
2/01/2022 Banana Dog 540 908 397 726
3/01/2022 Orange Pig 444 835 258 986
4/01/2022 Blueberry Rat 969 582 393 911
5/01/2022 Watermelon Bird 426 950 321 337
…
The other contains product prices and the day they started to be sold in store.
product start_date price
x1 2/01/2022 0.9
x2 4/01/2022 0.5
x3 4/01/2022 1.2
… 5/01/2022 1.0
What I'm trying to do
I am trying to:
- Zero out forecast sales volume, if it's before the Product Sales Start Date
- Multiply Forecast Sales volume by the product price to forecast revenue
Desired Result
date ColA Colb x1 x2 x3 …
1/01/2022 Apple Cat 0 0 0 0
2/01/2022 Banana Dog 486 0 0 0
3/01/2022 Orange Pig 399.6 0 0 0
4/01/2022 Blueberry Rat 872.1 291 471.6 0
5/01/2022 Watermelon Bird 383.4 475 385.2 337
…
Problem
The problem is the number of products varies. I am currently doing it manually editing my code each time to do this sum. Multiplying by the price is easy enough, but the Start_Date is tough, I don't know an easier way to do this dynamically. Below is an example of my current code.
Note: for the most part the two dataframes will be in the same order (i.e. x1, x2, x3, etc)
Code Example
import pandas as pd
# Dataframe one example
df1 = pd.DataFrame(data={
'Date': pd.Series(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'], dtype='datetime64[ns]'),
'ColA': ['Apple', 'Banana', 'Orange', 'Blueberry', 'Watermelon'],
'ColB': ['Cat', 'Dog', 'Pig', 'Rat', 'Bird'],
'x1': [703, 540, 444, 969, 426],
'x2': [750, 908, 835, 582, 950],
'x3': [273, 397, 258, 393, 321],
'x4': [738, 726, 986, 911, 337]
})
# Dataframe two example
df2 = pd.DataFrame(data={
'Product': ['x1', 'x2', 'x3', 'x4'],
'Start_Date': pd.Series(['2022-01-02', '2022-01-04', '2022-01-04', '2022-01-05'], dtype='datetime64[ns]'),
'Price': [0.9, 0.5, 1.2, 1.0]
})
df2.set_index('Product')
# Multipy by Price
df3 = df1.iloc[:, 3:7].multiply(df2[['Price']].values.T, axis='columns')
df3['sd1'] = df2['Start_Date'].iloc[0]
df3['sd2'] = df2['Start_Date'].iloc[1]
df3['sd3'] = df2['Start_Date'].iloc[2]
df3['sd4'] = df2['Start_Date'].iloc[3]
df3 = df1[['Date', 'ColA', 'ColB']].merge(df3, left_index=True, right_index=True)
print(df3)
CodePudding user response:
Below code snippet should solve your purpose :) :
from datetime import datetime
import pandas as pd
# Dataframe one example
df1 = pd.DataFrame(data={
'Date': pd.Series(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'], dtype='datetime64[ns]'),
'ColA': ['Apple', 'Banana', 'Orange', 'Blueberry', 'Watermelon'],
'ColB': ['Cat', 'Dog', 'Pig', 'Rat', 'Bird'],
'x1': [703, 540, 444, 969, 426],
'x2': [750, 908, 835, 582, 950],
'x3': [273, 397, 258, 393, 321],
'x4': [738, 726, 986, 911, 337]
})
# Dataframe two example
df2 = pd.DataFrame(data={
'Product': ['x1', 'x2', 'x3', 'x4'],
'Start_Date': pd.Series(['2022-01-02', '2022-01-04', '2022-01-04', '2022-01-05'], dtype='datetime64[ns]'),
'Price': [0.9, 0.5, 1.2, 1.0]
})
df2.set_index('Product', inplace=True)
# Multipy by Price and Generate final df
cols_to_work = [col for col in df1.columns if col.startswith('x')]
datetime_col = 'Date'
datetime_col_df2 = 'Start_Date'
for idx, rows in df1.iterrows():
for col in cols_to_work:
if rows[datetime_col] >= df2[datetime_col_df2].loc[col]:
# print(rows[datetime_col], df2['Start_Date'].loc[col], rows[datetime_col] >= df2['Start_Date'].loc[col])
# print(col, '-', df1[col].loc[idx], ' * ', df2['Price'].loc[col], ' = ', df1[col].loc[idx] * df2['Price'].loc[col])
df1[col].loc[idx] = df1[col].loc[idx] * df2['Price'].loc[col]
else:
df1[col].loc[idx] = 0
df1.head()
OUTPUT:
Date ColA ColB x1 x2 x3 x4
0 2022-01-01 Apple Cat 0.0 0.0 0.0 0.0
1 2022-01-02 Banana Dog 486.0 0.0 0.0 0.0
2 2022-01-03 Orange Pig 399.6 0.0 0.0 0.0
3 2022-01-04 Blueberry Rat 872.1 291.0 471.6 0.0
4 2022-01-05 Watermelon Bird 383.4 475.0 385.2 337.0