I need to assign the correct fiscal period to each transaction based on a separate table.
I have two datasets.
Dataset 1 has transaction information as below:
Transaction Number | Transaction Date | Customer Name | Fiscal Period |
---|---|---|---|
123456 | 01-JAN-2022 | ABC Company | |
123500 | 03-MAR-2022 | Z Company | |
124600 | 01-APR-2022 | X Company | |
124700 | 04-FEB-2022 | XYZ Company |
Dataset 2 is the table listing the fiscal periods for each start/end date.
Start Date | End Date | Fiscal Period |
---|---|---|
01-JAN-2022 | 04-FEB-2022 | JAN-2022 |
05-FEB-2022 | 04-MAR-2022 | FEB-2022 |
05-MAR-2022 | 01-APR-2022 | MAR-2022 |
Expected result:
Transaction Number | Transaction Date | Customer Name | Fiscal Period |
---|---|---|---|
123456 | 01-JAN-2022 | ABC Company | JAN-2022 |
123500 | 03-MAR-2022 | Z Company | FEB-2022 |
124600 | 01-APR-2022 | X Company | MAR-2022 |
124700 | 04-FEB-2022 | XYZ Company | JAN-2022 |
Could someone please help me with how I could accomplish this using Python/Pandas? The only way I know how to handle something similar is by using the pd.merge function. However, since the "key" value is a range (start/end dates), I have no clue on how to go about it. Thank you!
CodePudding user response:
This does the job,
df1["Transaction Date"] = pd.to_datetime(df1["Transaction Date"])
df2 = df2.apply(pd.to_datetime)
transaction_date = df1.loc[1, "Transaction Date"]
fiscal_periods_indices = []
for transaction_date in df1["Transaction Date"]:
fiscal_period = df2[(transaction_date >= df2["Start Date"]) & (transaction_date <= df2["End Date"])].index[0]
fiscal_periods_indices.append(fiscal_period)
df1["Fiscal Period"] = df2.loc[fiscal_periods_indices, "Fiscal Period"].reset_index(drop = True)
Output -
Transaction Number | Transaction Date | Customer Name | Fiscal Period | |
---|---|---|---|---|
0 | 123456 | 2022-01-01 00:00:00 | ABC Company | 2022-01-01 |
1 | 123500 | 2022-03-03 00:00:00 | Z Company | 2022-02-01 |
2 | 124600 | 2022-04-01 00:00:00 | X Company | 2022-03-01 |
3 | 124700 | 2022-02-04 00:00:00 | XYZ Company | 2022-01-01 |
CodePudding user response:
Since your intervals do not overlap, you can use pd.cut
:
bins = pd.to_datetime(df2['Start Date'].to_list() [df2['End Date'].iloc[-1]])
df1['Fiscal Period'] = pd.cut(pd.to_datetime(df1['Transaction Date']), bins=bins,
labels=df2['Fiscal Period'], include_lowest=True)
Output:
Transaction Number | Transaction Date | Customer Name | Fiscal Period |
---|---|---|---|
123456 | 01-JAN-2022 | ABC Company | JAN-2022 |
123500 | 03-MAR-2022 | Z Company | FEB-2022 |
124600 | 01-APR-2022 | X Company | MAR-2022 |
124700 | 04-FEB-2022 | XYZ Company | JAN-2022 |