Home > Net >  Need to assign the correct value to each transaction based on a separate table
Need to assign the correct value to each transaction based on a separate table

Time:05-10

I need to assign the correct fiscal period to each transaction based on a separate table.

I have two datasets.

Dataset 1 has transaction information as below:

Transaction Number Transaction Date Customer Name Fiscal Period
123456 01-JAN-2022 ABC Company
123500 03-MAR-2022 Z Company
124600 01-APR-2022 X Company
124700 04-FEB-2022 XYZ Company

Dataset 2 is the table listing the fiscal periods for each start/end date.

Start Date End Date Fiscal Period
01-JAN-2022 04-FEB-2022 JAN-2022
05-FEB-2022 04-MAR-2022 FEB-2022
05-MAR-2022 01-APR-2022 MAR-2022

Expected result:

Transaction Number Transaction Date Customer Name Fiscal Period
123456 01-JAN-2022 ABC Company JAN-2022
123500 03-MAR-2022 Z Company FEB-2022
124600 01-APR-2022 X Company MAR-2022
124700 04-FEB-2022 XYZ Company JAN-2022

Could someone please help me with how I could accomplish this using Python/Pandas? The only way I know how to handle something similar is by using the pd.merge function. However, since the "key" value is a range (start/end dates), I have no clue on how to go about it. Thank you!

CodePudding user response:

This does the job,

df1["Transaction Date"] = pd.to_datetime(df1["Transaction Date"])
df2 = df2.apply(pd.to_datetime)

transaction_date = df1.loc[1, "Transaction Date"]
fiscal_periods_indices = []

for transaction_date in df1["Transaction Date"]:
  fiscal_period = df2[(transaction_date >= df2["Start Date"]) & (transaction_date <= df2["End Date"])].index[0]
  fiscal_periods_indices.append(fiscal_period)

df1["Fiscal Period"] = df2.loc[fiscal_periods_indices, "Fiscal Period"].reset_index(drop = True)

Output -

Transaction Number Transaction Date Customer Name Fiscal Period
0 123456 2022-01-01 00:00:00 ABC Company 2022-01-01
1 123500 2022-03-03 00:00:00 Z Company 2022-02-01
2 124600 2022-04-01 00:00:00 X Company 2022-03-01
3 124700 2022-02-04 00:00:00 XYZ Company 2022-01-01

CodePudding user response:

Since your intervals do not overlap, you can use pd.cut:

bins = pd.to_datetime(df2['Start Date'].to_list()   [df2['End Date'].iloc[-1]])

df1['Fiscal Period'] = pd.cut(pd.to_datetime(df1['Transaction Date']), bins=bins,
                              labels=df2['Fiscal Period'], include_lowest=True)

Output:

Transaction Number Transaction Date Customer Name Fiscal Period
123456 01-JAN-2022 ABC Company JAN-2022
123500 03-MAR-2022 Z Company FEB-2022
124600 01-APR-2022 X Company MAR-2022
124700 04-FEB-2022 XYZ Company JAN-2022
  • Related