I am trying to select the records that correspond to each 15th of the month, here is the examle of the data:
date open date1 date2
0 2000-01-03 00:00:00 0 2000-01-17 00:00:00 2000-02-15 00:00:00
1 2000-01-07 00:00:00 6084704 2000-01-17 00:00:00 2000-02-15 00:00:00
3 2000-01-12 00:00:00 992482 2000-01-17 00:00:00 2000-02-15 00:00:00
4 2000-01-17 00:00:00 11721104 2000-02-15 00:00:00 2000-03-15 00:00:00
5 2000-01-18 00:00:00 4459200 2000-02-15 00:00:00 2000-03-15 00:00:00
6 2000-01-19 00:00:00 3652363 2000-02-15 00:00:00 2000-03-15 00:00:00
7 2000-01-27 00:00:00 17421705 2000-02-15 00:00:00 2000-03-15 00:00:00
...
43 2000-03-24 00:00:00 13594539 2000-04-17 00:00:00 2000-05-15 00:00:00
44 2000-03-27 00:00:00 5122526 2000-04-17 00:00:00 2000-05-15 00:00:00
45 2000-03-29 00:00:00 3041827 2000-04-17 00:00:00 2000-05-15 00:00:00
46 2000-03-30 00:00:00 4769960 2000-04-17 00:00:00 2000-05-15 00:00:00
48 2000-03-31 00:00:00 15982410 2000-04-17 00:00:00 2000-05-15 00:00:00
49 2000-04-04 00:00:00 3178232 2000-04-17 00:00:00 2000-05-15 00:00:00
51 2000-04-05 00:00:00 15571975 2000-04-17 00:00:00 2000-05-15 00:00:00
52 2000-04-06 00:00:00 2694165 2000-04-17 00:00:00 2000-05-15 00:00:00
53 2000-04-07 00:00:00 536426 2000-04-17 00:00:00 2000-05-15 00:00:00
55 2000-04-10 00:00:00 5523700 2000-04-17 00:00:00 2000-05-15 00:00:00
56 2000-04-11 00:00:00 11212425 2000-04-17 00:00:00 2000-05-15 00:00:00
58 2000-04-12 00:00:00 546223 2000-04-17 00:00:00 2000-05-15 00:00:00
61 2000-04-13 00:00:00 17913960 2000-05-15 00:00:00 2000-06-15 00:00:00
63 2000-04-17 00:00:00 3824185 2000-05-15 00:00:00 2000-06-15 00:00:00
...
1578 2006-01-03 00:00:00 4453005 2006-01-16 00:00:00 2006-02-15 00:00:00
1579 2006-01-04 00:00:00 6557373 2006-01-16 00:00:00 2006-02-15 00:00:00
1580 2006-01-05 00:00:00 2837987 2006-01-16 00:00:00 2006-02-15 00:00:00
...
1634 2006-03-14 00:00:00 31819879 2006-04-17 00:00:00 2006-05-15 00:00:00
1635 2006-03-20 00:00:00 1541321 2006-04-17 00:00:00 2006-05-15 00:00:00
1636 2006-03-21 00:00:00 47047150 2006-04-17 00:00:00 2006-05-15 00:00:00
1638 2006-03-22 00:00:00 6111712 2006-04-17 00:00:00 2006-05-15 00:00:00
Sometimes there is no 15th, since I have a range of business days. For such case I'd like to choose the record closes to 15th of a month. So, the expected outcome would be smth like:
3 2000-01-12 00:00:00 992482 2000-01-17 00:00:00 2000-02-15 00:00:00
...
61 2000-04-13 00:00:00 17913960 2000-05-15 00:00:00 2000-06-15 00:00:00
...
1634 2006-03-14 00:00:00 31819879 2006-04-17 00:00:00 2006-05-15 00:00:00
I figured out how to find every 15th of the month, that is not a very difficult:
mid_date_mask = df['date'].map(lambda x: x.day) == 15
df[mid_date_mask]
But I cannot understand how to incorporate a step back when there is no 15th in a month.
I would be very grateful for your help!
CodePudding user response:
You could try if the following works for you:
mid_date_idx = (
(df["date"].dt.day - 15).abs()
.groupby([df["date"].dt.year, df["date"].dt.month])
.idxmin()
.to_list()
)
mid_date_mask = df.index.isin(mid_date_idx)
df
is your dataframe- First bulid a series from column
date
with the absolute difference of the day and15
. - Then group the result by year-month pairs, use
.idxmin()
to get the first index with the minimum per group, and collect them in a listmid_date_idx
. - Build the
mid_date_mask
by checking if the index is in the list.