I have a pandas df of daily stock prices that contains data from 2007 to the present. The df has a multi-index of Ticker and Date. I am trying to get a new df that is a subset of this data that only has data from 2021 onwards.
This code works, but is very slow.
# reduce the dataset to just data since start date
start_date = pd.to_datetime("2021-01-01")
new_df = df.iloc[df.index.get_level_values('Date') >= start_date]
I tried another way, but it only returns values for the exact date. How do I change it to select a range from 2021 onwards?
new_df = df.loc[(slice(None), '2021-01-01'), :]
Another solution would be to reset the index to get the date into a column, and then filter and reindex, but I thought there must be an easier way to do this.
CodePudding user response:
Use loc
with axis=0
, so possible specify values after 2021-01-01
:
mux = pd.MultiIndex.from_product([['A', 'B'],
pd.date_range('2020-01-01', freq='6MS', periods=5)])
df = pd.DataFrame({'a': range(10)}, index=mux)
print (df)
a
A 2020-01-01 0
2020-07-01 1
2021-01-01 2
2021-07-01 3
2022-01-01 4
B 2020-01-01 5
2020-07-01 6
2021-01-01 7
2021-07-01 8
2022-01-01 9
new_df = df.loc(axis=0)[:, '2021-01-01':]
print (new_df)
a
A 2021-01-01 2
2021-07-01 3
2022-01-01 4
B 2021-01-01 7
2021-07-01 8
2022-01-01 9
For improve performance should be used a bit changed your solution, because loc
solution is slow:
new_df = df[df.index.get_level_values('Date') >= "2021-01-01"]