I am trying to pull the second closest date for each as of date in a data set. Below is some example data
asOfDate maturityDate value
0 2022-09-01 2022-10-01 57.273
1 2022-09-01 2022-11-01 55.861
2 2022-09-01 2022-12-01 59.231
3 2022-09-01 2023-01-01 59.305
4 2022-09-01 2023-02-01 58.081
5 2022-09-01 2023-03-01 51.198
6 2022-09-01 2023-04-01 44.532
7 2022-09-01 2023-05-01 38.955
8 2022-09-01 2023-06-01 37.901
9 2022-09-01 2023-07-01 40.461
10 2022-09-01 2023-08-01 40.126
11 2022-09-01 2023-09-01 40.221
12 2022-09-01 2023-10-01 40.455
13 2022-09-01 2023-11-01 41.652
14 2022-09-01 2023-12-01 42.450
15 2022-09-01 2024-01-01 45.057
16 2022-09-01 2024-02-01 42.459
17 2022-09-01 2024-03-01 37.544
18 2022-09-01 2024-04-01 29.901
19 2022-09-01 2024-05-01 25.300
20 2022-09-01 2024-06-01 22.360
21 2022-09-01 2024-07-01 22.544
22 2022-09-01 2024-08-01 23.931
I have a list of asOfDates with several maturity dates going out. I currently am using this code to pull the maturityDate and value that is the smallest (or rolling front month) for each asOfDate.
df_final[df_final['maturityDate']==df_final.groupby(['asOfDate'])['maturityDate'].transform(min)]
The above works perfectly for my needs, but now I would like to, instead of getting the smallest date, receive the second smallest for each as of date. I have attempted to get this second smallest, but I received an error that states, "error: Can only compare identically-labeled Series objects". Below is what I have tried that does output the dates I want, but when I try to reindex the data frame I get the error.
df_final.groupby(['asOfDate'])['maturityDate'].nsmallest(2).groupby(['asOfDate']).last().reset_index()
df_final[df_final['maturityDate']==df_final.groupby(['asOfDate'])['maturityDate'].nsmallest(2).groupby(['asOfDate']).last().reset_index()]
The first one above outputs the below, which is the desired results along with the values any ideas here?
asOfDate maturityDate
0 2022-09-01 2022-11-01
1 2022-09-02 2022-11-01
2 2022-09-05 2022-11-01
3 2022-09-06 2022-11-01
4 2022-09-07 2022-11-01
5 2022-09-08 2022-11-01
6 2022-09-09 2022-11-01
7 2022-09-12 2022-11-01
8 2022-09-13 2022-11-01
9 2022-09-14 2022-11-01
10 2022-09-15 2022-11-01
11 2022-09-16 2022-12-01
12 2022-09-19 2022-12-01
13 2022-09-20 2022-12-01
14 2022-09-21 2022-12-01
15 2022-09-22 2022-12-01
16 2022-09-23 2022-12-01
17 2022-09-26 2022-12-01
18 2022-09-27 2022-12-01
19 2022-09-28 2022-12-01
20 2022-09-29 2022-12-01
21 2022-09-30 2022-12-01
22 2022-10-03 2022-12-01
23 2022-10-04 2022-12-01
24 2022-10-05 2022-12-01
25 2022-10-06 2022-12-01
26 2022-10-07 2022-12-01
27 2022-10-10 2022-12-01
28 2022-10-11 2022-12-01
29 2022-10-12 2022-12-01
CodePudding user response:
import pandas as pd
columns = ["asOfDate", "maturityDate", "value"]
data = [
["2022-09-01", "2022-10-01", 57.273]
,["2022-09-01", "2022-11-01", 55.861]
,["2022-09-01", "2022-12-01", 59.231]
,["2022-09-01", "2023-01-01", 59.305]
,["2022-09-01", "2023-02-01", 58.081]
,["2022-09-01", "2023-03-01", 51.198]
,["2022-09-01", "2023-04-01", 44.532]
,["2022-09-01", "2023-05-01", 38.955]
,["2022-09-01", "2023-06-01", 37.901]
,["2022-09-01", "2023-07-01", 40.461]
,["2022-09-01", "2023-08-01", 40.126]
,["2022-09-01", "2023-09-01", 40.221]
,["2022-09-01", "2023-10-01", 40.455]
,["2022-09-01", "2023-11-01", 41.652]
,["2022-09-01", "2023-12-01", 42.450]
,["2022-09-01", "2024-01-01", 45.057]
,["2022-09-01", "2024-02-01", 42.459]
,["2022-09-01", "2024-03-01", 37.544]
,["2022-09-01", "2024-04-01", 29.901]
,["2022-09-01", "2024-05-01", 25.300]
,["2022-09-01", "2024-06-01", 22.360]
,["2022-09-01", "2024-07-01", 22.544]
,["2022-09-01", "2024-08-01", 23.931]
]
df = pd.DataFrame.from_records(data=data, columns=columns)
for col in ["asOfDate", "maturityDate"]:
df[col] = pd.to_datetime(df[col])
df.sort_values(["asOfDate","maturityDate"]).groupby("asOfDate").agg({"maturityDate":lambda x: x.shift(-1).values[0]})