Home > Software engineering >  Filter Pandas df with multi index that includes date to only rows after a given date
Filter Pandas df with multi index that includes date to only rows after a given date

Time:05-05

I have a pandas df of daily stock prices that contains data from 2007 to the present. The df has a multi-index of Ticker and Date. I am trying to get a new df that is a subset of this data that only has data from 2021 onwards.

enter image description here

This code works, but is very slow.

# reduce the dataset to just data since start date
start_date = pd.to_datetime("2021-01-01")
new_df = df.iloc[df.index.get_level_values('Date') >= start_date]

I tried another way, but it only returns values for the exact date. How do I change it to select a range from 2021 onwards?

new_df =  df.loc[(slice(None), '2021-01-01'), :]

Another solution would be to reset the index to get the date into a column, and then filter and reindex, but I thought there must be an easier way to do this.

CodePudding user response:

Use loc with axis=0, so possible specify values after 2021-01-01:

mux = pd.MultiIndex.from_product([['A', 'B'],
                                  pd.date_range('2020-01-01', freq='6MS', periods=5)])
df = pd.DataFrame({'a': range(10)}, index=mux)
print (df)
              a
A 2020-01-01  0
  2020-07-01  1
  2021-01-01  2
  2021-07-01  3
  2022-01-01  4
B 2020-01-01  5
  2020-07-01  6
  2021-01-01  7
  2021-07-01  8
  2022-01-01  9
  
new_df = df.loc(axis=0)[:, '2021-01-01':]
print (new_df)
              a
A 2021-01-01  2
  2021-07-01  3
  2022-01-01  4
B 2021-01-01  7
  2021-07-01  8
  2022-01-01  9

For improve performance should be used a bit changed your solution, because loc solution is slow:

new_df = df[df.index.get_level_values('Date') >= "2021-01-01"]
  • Related