Home > Back-end >  Dynamically merge two dataframes
Dynamically merge two dataframes

Time:08-10

Background

I have two dataframes, one contains some unrelated columns and forecast sales volume (x1, x2, ...)

date        ColA        Colb    x1  x2  x3  …
1/01/2022   Apple       Cat     703 750 273 738
2/01/2022   Banana      Dog     540 908 397 726
3/01/2022   Orange      Pig     444 835 258 986
4/01/2022   Blueberry   Rat     969 582 393 911
5/01/2022   Watermelon  Bird    426 950 321 337
…       

The other contains product prices and the day they started to be sold in store.

product start_date  price
x1      2/01/2022   0.9
x2      4/01/2022   0.5
x3      4/01/2022   1.2
…       5/01/2022   1.0

What I'm trying to do

I am trying to:

  1. Zero out forecast sales volume, if it's before the Product Sales Start Date
  2. Multiply Forecast Sales volume by the product price to forecast revenue

Desired Result

date        ColA        Colb    x1      x2  x3      …
1/01/2022   Apple       Cat     0       0   0       0
2/01/2022   Banana      Dog     486     0   0       0
3/01/2022   Orange      Pig     399.6   0   0       0
4/01/2022   Blueberry   Rat     872.1   291 471.6   0
5/01/2022   Watermelon  Bird    383.4   475 385.2   337
…       
        

Problem

The problem is the number of products varies. I am currently doing it manually editing my code each time to do this sum. Multiplying by the price is easy enough, but the Start_Date is tough, I don't know an easier way to do this dynamically. Below is an example of my current code.

Note: for the most part the two dataframes will be in the same order (i.e. x1, x2, x3, etc)

Code Example

import pandas as pd

# Dataframe one example
df1 = pd.DataFrame(data={
    'Date': pd.Series(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'], dtype='datetime64[ns]'),
    'ColA': ['Apple', 'Banana', 'Orange', 'Blueberry', 'Watermelon'],
    'ColB': ['Cat', 'Dog', 'Pig', 'Rat', 'Bird'],
    'x1': [703, 540, 444, 969, 426],
    'x2': [750, 908, 835, 582, 950],
    'x3': [273, 397, 258, 393, 321],
    'x4': [738, 726, 986, 911, 337]
})

# Dataframe two example
df2 = pd.DataFrame(data={
    'Product': ['x1', 'x2', 'x3', 'x4'],
    'Start_Date': pd.Series(['2022-01-02', '2022-01-04', '2022-01-04', '2022-01-05'], dtype='datetime64[ns]'),
    'Price': [0.9, 0.5, 1.2, 1.0]
})
df2.set_index('Product')

# Multipy by Price
df3 = df1.iloc[:, 3:7].multiply(df2[['Price']].values.T, axis='columns')
df3['sd1'] = df2['Start_Date'].iloc[0]
df3['sd2'] = df2['Start_Date'].iloc[1]
df3['sd3'] = df2['Start_Date'].iloc[2]
df3['sd4'] = df2['Start_Date'].iloc[3]
df3 = df1[['Date', 'ColA', 'ColB']].merge(df3, left_index=True, right_index=True)
print(df3)

CodePudding user response:

Below code snippet should solve your purpose :) :

from datetime import datetime
import pandas as pd

# Dataframe one example
df1 = pd.DataFrame(data={
    'Date': pd.Series(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04', '2022-01-05'], dtype='datetime64[ns]'),
    'ColA': ['Apple', 'Banana', 'Orange', 'Blueberry', 'Watermelon'],
    'ColB': ['Cat', 'Dog', 'Pig', 'Rat', 'Bird'],
    'x1': [703, 540, 444, 969, 426],
    'x2': [750, 908, 835, 582, 950],
    'x3': [273, 397, 258, 393, 321],
    'x4': [738, 726, 986, 911, 337]
})

# Dataframe two example
df2 = pd.DataFrame(data={
    'Product': ['x1', 'x2', 'x3', 'x4'],
    'Start_Date': pd.Series(['2022-01-02', '2022-01-04', '2022-01-04', '2022-01-05'], dtype='datetime64[ns]'),
    'Price': [0.9, 0.5, 1.2, 1.0]
})
df2.set_index('Product', inplace=True)

# Multipy by Price and Generate final df
cols_to_work = [col for col in df1.columns if col.startswith('x')]
datetime_col = 'Date'
datetime_col_df2 = 'Start_Date'

for idx, rows in df1.iterrows():
    for col in cols_to_work:
        if rows[datetime_col] >= df2[datetime_col_df2].loc[col]:
            # print(rows[datetime_col], df2['Start_Date'].loc[col], rows[datetime_col] >= df2['Start_Date'].loc[col])
            # print(col, '-', df1[col].loc[idx], ' * ', df2['Price'].loc[col], ' = ', df1[col].loc[idx] * df2['Price'].loc[col])
            df1[col].loc[idx] = df1[col].loc[idx] * df2['Price'].loc[col]
        else:
            df1[col].loc[idx] = 0

df1.head()

OUTPUT:

        Date        ColA  ColB     x1     x2     x3     x4
0 2022-01-01       Apple   Cat    0.0    0.0    0.0    0.0
1 2022-01-02      Banana   Dog  486.0    0.0    0.0    0.0
2 2022-01-03      Orange   Pig  399.6    0.0    0.0    0.0
3 2022-01-04   Blueberry   Rat  872.1  291.0  471.6    0.0
4 2022-01-05  Watermelon  Bird  383.4  475.0  385.2  337.0
  • Related