Home > front end >  Find Last Available Date in Pandas Data Frame
Find Last Available Date in Pandas Data Frame

Time:09-17

Suppose that I have a Pandas DataFrame as below:

 ------------ ------- 
|    Date    | Price |
 ------------ ------- 
| 01/01/2021 |    10 |
| 01/02/2021 |    20 |
| 01/03/2021 |    30 |
| 01/05/2021 |    40 |
| 01/08/2021 |    20 |
| 01/09/2021 |    10 |
 ------------ ------- 

The above data frame can be generated using code below:

df = pd.DataFrame({'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-05', '2021-01-08', '2021-01-09'],
                   'Price': [10, 20, 30, 40, 20, 10]})
df['Date'] = pd.to_datetime(df['Date'])

Now given a date stored in variable end_date. The first step is to find if the date exists in the data frame. It can be done using the below code:

if end_date in df.Date.values:
    pass
else:
    # find last available date.

What would be the most elegant way to find the last available date in data frame.

E.g. if end_date = '2021-01-10'. Since it does not exists in data frame I want end_date value to be set at 2021-01-09. Similarly, if end_date = 2021-01-07 I want end_date value to be set at 2021-01-05.

Alternatively if end_date = 2021-01-08 end_date won't be overwritten and would remain as is i.e. end_date = 2021-01-08.

CodePudding user response:

The other answers are assuming the dates are always in order in your dataframe.

Since your dates are sortable, you can just use comparison operators (note that this will work even if you keep them as strings, as the format you are using is lexicographically sortable).

To get the last available date, first filter out dates after end_date and then find the max:

end_date = df[df['Date'] <= end_date]['Date'].max()

CodePudding user response:

You could try with loc and iloc:

end_date = "2021-01-07"
print(df.loc[df['Date'] <= end_date, 'Date'].iloc[-1].date())

Output:

2021-01-05

Or:

end_date = "2021-01-07"
print(df.loc[df['Date'] <= end_date, 'Date'].max().date())
  • Related