I got the following dataframe,df, with the report_date
as the index:
report_date | sales |
---|---|
2021-06-30 | 130000 |
2021-06-30 | 140000 |
2021-07-31 | 125000 |
2021-07-31 | 110000 |
2021-08-31 | 110000 |
2021-08-31 | 110000 |
And I want to extract 2021-06 and 2021-08 only. How could I achieve this goal?
I can just extract two individual dataframe by df['2021-06']
and df['2021-08']
CodePudding user response:
For match values is possible convert DatetimeIndex
to months periods and test membership by Index.isin
:
#if necessary
#df.index = pd.to_datetime(df.index)
df3 = df[df.index.to_period('m').isin(pd.to_datetime(['2021-06','2021-08']).to_period('m'))]
print (df3)
sales
report_date
2021-06-30 130000
2021-06-30 140000
2021-08-31 110000
2021-08-31 110000
Or:
df3 = df[df.index.to_period('m').isin(pd.PeriodIndex(['2021-06','2021-08'], freq='m'))]
print (df3)
sales
report_date
2021-06-30 130000
2021-06-30 140000
2021-08-31 110000
2021-08-31 110000
Or convert values to strings YYYY-MM
and test by strings in list:
df3 = df[df.index.strftime('%Y-%m').isin(['2021-06','2021-08'])]
print (df3)
sales
report_date
2021-06-30 130000
2021-06-30 140000
2021-08-31 110000
2021-08-31 110000
CodePudding user response:
Just do
df = df.reset_index()
new_df = df[(df["report_date"] == "2021-06") | (df["report_date"] == "2021-07")]
or using numpy
new_df = df.iloc[np.where((df.index == "2021-06") | (df.index == "2021-07"))[0], :]
Output -
report_date | sales | |
---|---|---|
0 | 2021-06 | 130000 |
1 | 2021-06 | 140000 |
2 | 2021-07 | 125000 |
3 | 2021-07 | 110000 |