Need to assign the correct value to each transaction based on a separate table-CodePudding

I need to assign the correct fiscal period to each transaction based on a separate table.

I have two datasets.

Dataset 1 has transaction information as below:

Transaction Number	Transaction Date	Customer Name
123456	01-JAN-2022	ABC Company
123500	03-MAR-2022	Z Company
124600	01-APR-2022	X Company
124700	04-FEB-2022	XYZ Company

Dataset 2 is the table listing the fiscal periods for each start/end date.

Start Date	End Date	Fiscal Period
01-JAN-2022	04-FEB-2022	JAN-2022
05-FEB-2022	04-MAR-2022	FEB-2022
05-MAR-2022	01-APR-2022	MAR-2022

Expected result:

Transaction Number	Transaction Date	Customer Name	Fiscal Period
123456	01-JAN-2022	ABC Company	JAN-2022
123500	03-MAR-2022	Z Company	FEB-2022
124600	01-APR-2022	X Company	MAR-2022
124700	04-FEB-2022	XYZ Company	JAN-2022

Could someone please help me with how I could accomplish this using Python/Pandas? The only way I know how to handle something similar is by using the pd.merge function. However, since the "key" value is a range (start/end dates), I have no clue on how to go about it. Thank you!

CodePudding user response：

This does the job,

df1["Transaction Date"] = pd.to_datetime(df1["Transaction Date"])
df2 = df2.apply(pd.to_datetime)

transaction_date = df1.loc[1, "Transaction Date"]
fiscal_periods_indices = []

for transaction_date in df1["Transaction Date"]:
  fiscal_period = df2[(transaction_date >= df2["Start Date"]) & (transaction_date <= df2["End Date"])].index[0]
  fiscal_periods_indices.append(fiscal_period)

df1["Fiscal Period"] = df2.loc[fiscal_periods_indices, "Fiscal Period"].reset_index(drop = True)

Output -

	Transaction Number	Transaction Date	Customer Name	Fiscal Period
0	123456	2022-01-01 00:00:00	ABC Company	2022-01-01
1	123500	2022-03-03 00:00:00	Z Company	2022-02-01
2	124600	2022-04-01 00:00:00	X Company	2022-03-01
3	124700	2022-02-04 00:00:00	XYZ Company	2022-01-01

CodePudding user response：

Since your intervals do not overlap, you can use pd.cut:

bins = pd.to_datetime(df2['Start Date'].to_list()   [df2['End Date'].iloc[-1]])

df1['Fiscal Period'] = pd.cut(pd.to_datetime(df1['Transaction Date']), bins=bins,
                              labels=df2['Fiscal Period'], include_lowest=True)

Output:

Transaction Number	Transaction Date	Customer Name	Fiscal Period
123456	01-JAN-2022	ABC Company	JAN-2022
123500	03-MAR-2022	Z Company	FEB-2022
124600	01-APR-2022	X Company	MAR-2022
124700	04-FEB-2022	XYZ Company	JAN-2022