Home > Software engineering >  select records by date, where date is 15th ot the previous business day of the month
select records by date, where date is 15th ot the previous business day of the month

Time:08-10

I am trying to select the records that correspond to each 15th of the month, here is the examle of the data:

           date              open               date1          date2
0       2000-01-03 00:00:00  0               2000-01-17 00:00:00    2000-02-15 00:00:00
1       2000-01-07 00:00:00  6084704         2000-01-17 00:00:00    2000-02-15 00:00:00
3       2000-01-12 00:00:00  992482          2000-01-17 00:00:00    2000-02-15 00:00:00
4       2000-01-17 00:00:00  11721104        2000-02-15 00:00:00    2000-03-15 00:00:00
5       2000-01-18 00:00:00  4459200         2000-02-15 00:00:00    2000-03-15 00:00:00
6       2000-01-19 00:00:00 3652363          2000-02-15 00:00:00    2000-03-15 00:00:00
7       2000-01-27 00:00:00 17421705         2000-02-15 00:00:00    2000-03-15 00:00:00
...
43      2000-03-24 00:00:00 13594539         2000-04-17 00:00:00    2000-05-15 00:00:00
44      2000-03-27 00:00:00 5122526          2000-04-17 00:00:00    2000-05-15 00:00:00
45      2000-03-29 00:00:00 3041827          2000-04-17 00:00:00    2000-05-15 00:00:00
46      2000-03-30 00:00:00 4769960          2000-04-17 00:00:00    2000-05-15 00:00:00
48      2000-03-31 00:00:00 15982410         2000-04-17 00:00:00    2000-05-15 00:00:00
49      2000-04-04 00:00:00 3178232          2000-04-17 00:00:00    2000-05-15 00:00:00
51      2000-04-05 00:00:00 15571975         2000-04-17 00:00:00    2000-05-15 00:00:00
52      2000-04-06 00:00:00 2694165          2000-04-17 00:00:00    2000-05-15 00:00:00
53      2000-04-07 00:00:00 536426           2000-04-17 00:00:00    2000-05-15 00:00:00
55      2000-04-10 00:00:00 5523700          2000-04-17 00:00:00    2000-05-15 00:00:00
56      2000-04-11 00:00:00 11212425         2000-04-17 00:00:00    2000-05-15 00:00:00
58      2000-04-12 00:00:00 546223           2000-04-17 00:00:00    2000-05-15 00:00:00
61      2000-04-13 00:00:00 17913960         2000-05-15 00:00:00    2000-06-15 00:00:00
63      2000-04-17 00:00:00 3824185          2000-05-15 00:00:00    2000-06-15 00:00:00
...
1578    2006-01-03 00:00:00 4453005          2006-01-16 00:00:00    2006-02-15 00:00:00
1579    2006-01-04 00:00:00 6557373          2006-01-16 00:00:00    2006-02-15 00:00:00
1580    2006-01-05 00:00:00 2837987          2006-01-16 00:00:00    2006-02-15 00:00:00
...
1634    2006-03-14 00:00:00 31819879         2006-04-17 00:00:00    2006-05-15 00:00:00
1635    2006-03-20 00:00:00 1541321          2006-04-17 00:00:00    2006-05-15 00:00:00
1636    2006-03-21 00:00:00 47047150         2006-04-17 00:00:00    2006-05-15 00:00:00
1638    2006-03-22 00:00:00 6111712          2006-04-17 00:00:00    2006-05-15 00:00:00


Sometimes there is no 15th, since I have a range of business days. For such case I'd like to choose the record closes to 15th of a month. So, the expected outcome would be smth like:

3       2000-01-12 00:00:00  992482          2000-01-17 00:00:00    2000-02-15 00:00:00
...
61      2000-04-13 00:00:00 17913960         2000-05-15 00:00:00    2000-06-15 00:00:00
...
1634    2006-03-14 00:00:00 31819879         2006-04-17 00:00:00    2006-05-15 00:00:00

I figured out how to find every 15th of the month, that is not a very difficult:

mid_date_mask = df['date'].map(lambda x: x.day) == 15

df[mid_date_mask]

But I cannot understand how to incorporate a step back when there is no 15th in a month.

I would be very grateful for your help!

CodePudding user response:

You could try if the following works for you:

mid_date_idx = (
    (df["date"].dt.day - 15).abs()
    .groupby([df["date"].dt.year, df["date"].dt.month])
    .idxmin()
    .to_list()
)
mid_date_mask = df.index.isin(mid_date_idx)
  • df is your dataframe
  • First bulid a series from column date with the absolute difference of the day and 15.
  • Then group the result by year-month pairs, use .idxmin() to get the first index with the minimum per group, and collect them in a list mid_date_idx.
  • Build the mid_date_mask by checking if the index is in the list.
  • Related