Home > Back-end >  Create new column with 7th business day of the month based on year and month columns
Create new column with 7th business day of the month based on year and month columns

Time:08-10

I have a dataframe with two columns, the year and month of another variable which I left out for simplicity. It looks like this:

      YearOfSRC MonthNumberOfSRC
0       2022          3
1       2022          4
2       2022          5
3       2022          6
4       2021          4
... ... ...
20528   2022          1
20529   2022          2
20530   2022          3
20531   2022          4
20532   2022          5

I'd like to create a new column called PredictionDate that contains the date of the 7th business day of that month in that year in the format of YYYY-MM-DD. It seems like I could do something like this, but I'm not sure how to implement in a dataframe with the months and years in separate columns. I am sure it will involve an apply and/or groupby but again, not sure how to approach. Would I need to first create a helper column with the month and year combined, then use that to generate the desired output column?

EDIT The solutions given do work. However, I realized that some months' 7th business days would be offset by holidays that land in the first week of the month, like New Year's and July 4th. Therefore, I have posted a new question seeking how to incorporate this fact into the solution here.

CodePudding user response:

IIUC, you can use pandas.to_datetime to convert to datetime, then pandas.offsets.BusinessDay to get your business day:

df['PredictionDate'] = (pd
 .to_datetime(df[['YearOfSRC', 'MonthNumberOfSRC']]
                .set_axis(['year' ,'month'], axis=1)
                .assign(day=1)
              )
 .sub(pd.offsets.BusinessDay(1))
 .add(pd.offsets.BusinessDay(7))
)

output:

       YearOfSRC  MonthNumberOfSRC PredictionDate
0           2022                 3     2022-03-09
1           2022                 4     2022-04-11
2           2022                 5     2022-05-10
3           2022                 6     2022-06-09
4           2021                 4     2021-04-09
20528       2022                 1     2022-01-11
20529       2022                 2     2022-02-09
20530       2022                 3     2022-03-09
20531       2022                 4     2022-04-11
20532       2022                 5     2022-05-10

CodePudding user response:

If I understand you correctly you can convert the two columns to datetime format and add pd.tseries.offsets.BusinessDay(n = 7):

df["PredictionDate"] = (
    pd.to_datetime(
        df["YearOfSRC"].astype(str)
          "-"
          df["MonthNumberOfSRC"].astype(str)
          "-01",
        dayfirst=False,
    )
      pd.tseries.offsets.BusinessDay(n=7)
)

print(df)

Prints:

   YearOfSRC  MonthNumberOfSRC PredictionDate
0       2022                 3     2022-03-10
1       2022                 4     2022-04-12
2       2022                 5     2022-05-10
3       2022                 6     2022-06-10
4       2021                 4     2021-04-12
  • Related