I have a dataframe with dates and prices (as below).
df=pd.DataFrame({'date':['2015-01-01','2015-01-02','2015-01-03',
'2016-01-01','2016-01-02','2016-01-03',
'2017-01-01','2017-01-02','2017-01-03',
'2018-01-01','2018-01-02','2018-01-03'],
'price':[78,87,52,94,55,45,68,76,65,75,78,21]
})
df['date']=pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])
I want to select a range of specific dates to add to a new dataframe. For example, I would like to select prices for the first quarter of 2015 and the first quarter of 2016. I have provided data for a shorter time period for the example, so in this case, I would like to select the first 2 days of 2015 and the first 2 days of 2016.
I would like to end up with a dataframe like this (with date as the index).
date | price |
---|---|
2015-01-01 | 78 |
2015-01-02 | 87 |
2016-01-01 | 94 |
2016-01-02 | 55 |
I have been using this method to select dates, but I don't know how to select more than one range at a time
select_dates2=select_dates.loc['2015-01-01':'2015-01-02']
CodePudding user response:
Another way:
df['date'] = pd.to_datetime(df['date'])
df[df.date.dt.year.isin([2015, 2016]) & df.date.dt.day.lt(3)]
date price
0 2015-01-01 78
1 2015-01-02 87
3 2016-01-01 94
4 2016-01-02 55
CodePudding user response:
One option is to split years from the month-days, then use isin
to create a boolean mask to filter df
. Note that I assumed that date
is a column of strings.
tmp = df['date'].str.split('-', 1)
out = df[tmp.str[0].isin(['2015','2016']) & tmp.str[1].isin(['01-01','01-02'])]
If it's a datetime object column, then we could use the dt
accessor to select certain years and month-days:
df['date'] = pd.to_datetime(df['date'])
out = df[df['date'].dt.year.isin([2015, 2016]) & df['date'].dt.strftime('%m-%d').isin(['01-01','01-02'])]
Output:
date price
0 2015-01-01 78
1 2015-01-02 87
3 2016-01-01 94
4 2016-01-02 55
CodePudding user response:
one option is to get the index as a MultiIndex of date objects; this allows for a relatively easy selection on multiple levels (in this case, year and day):
(df
.assign(year = df.date.dt.year, day = df.date.dt.day)
.set_index(['year', 'day'])
.loc(axis = 0)[2015:2016, :2]
)
date price
year day
2015 1 2015-01-01 78
2 2015-01-02 87
2016 1 2016-01-01 94
2 2016-01-02 55