I am trying to use the last()
function on a dataframe with a timestamp index :
df = pd.DataFrame({
'timestamp' : ['2022-06-14 16:01:00.292000 00:00', '2022-05-18 05:00:37.843000 00:00', '2022-06-06 00:00:56.134000 00:00'],
'otherColumn' : ['A', 'B', 'C'],
})
df["timestamp"] = pd.to_datetime(df["timestamp"], format='%Y-%m-%d %H:%M:%S')
df = df.set_index(['timestamp'])
print(df.last('1D'))
here is what it returns : 2022-06-06 00:00:56.134000 00:00 C
I don't understand how it would return the 2022-06-06
, it should return the 2022-06-14
as this is the most recent one ?
CodePudding user response:
The last
documentation mentions (although not very explicitly) that the Index must be sorted:
For a DataFrame with a sorted DatetimeIndex, this function selects the last few rows based on a date offset.
And, indeed, if you look at the code of last
, the key logic is using searchsorted
:
start = self.index.searchsorted(start_date, side="right")
Thus:
df.sort_index().last('1D')
output:
otherColumn
timestamp
2022-06-14 16:01:00.292000 00:00 A