I'm a beginner and I'm wondering about this.
For example I have this code:
df = example.get_data
And I only know that the the header will be a date numpy.datetime64 type. How can I only keep the last 2 years data without knowing anything more about it?
I tried something like this:
df.drop(df.columns.year >= date.today().year-2, axis=1, inplace = True
But it's not working. Any suggestions?
CodePudding user response:
If your column names are e.g. '12/02/2021', '14/01/2021', '19/08/2019'
you can select all columns of the last two years like that:
from pandas.tseries.offsets import DateOffset
last_2_years = [c for c in df.columns if pd.to_datetime(c) > pd.Timestamp.today() - DateOffset(years=2)]
df = df[last_2_years]
It's usually easier to select the columns you want to keep than to drop the columns you don't need, but you can of course also do
cols_to_drop = [c for c in df.columns if pd.to_datetime(c) < pd.Timestamp.today()-DateOffset(years=2)]
df = df.drop(cols_to_drop, axis=1)