Home > Mobile >  Selecting multiple ranges of dates from dataframe
Selecting multiple ranges of dates from dataframe

Time:04-26

I have a dataframe with dates and prices (as below).

df=pd.DataFrame({'date':['2015-01-01','2015-01-02','2015-01-03',
                         '2016-01-01','2016-01-02','2016-01-03',
                         '2017-01-01','2017-01-02','2017-01-03',
                         '2018-01-01','2018-01-02','2018-01-03'],
                  'price':[78,87,52,94,55,45,68,76,65,75,78,21]
                })
df['date']=pd.to_datetime(df['date'], errors='ignore', format='%Y%m%d')
select_dates = df.set_index(['date'])

I want to select a range of specific dates to add to a new dataframe. For example, I would like to select prices for the first quarter of 2015 and the first quarter of 2016. I have provided data for a shorter time period for the example, so in this case, I would like to select the first 2 days of 2015 and the first 2 days of 2016.

I would like to end up with a dataframe like this (with date as the index).

date price
2015-01-01 78
2015-01-02 87
2016-01-01 94
2016-01-02 55

I have been using this method to select dates, but I don't know how to select more than one range at a time

select_dates2=select_dates.loc['2015-01-01':'2015-01-02']

CodePudding user response:

Another way:

df['date'] = pd.to_datetime(df['date'])
df[df.date.dt.year.isin([2015, 2016]) & df.date.dt.day.lt(3)]

        date  price
0 2015-01-01     78
1 2015-01-02     87
3 2016-01-01     94
4 2016-01-02     55

CodePudding user response:

One option is to split years from the month-days, then use isin to create a boolean mask to filter df. Note that I assumed that date is a column of strings.

tmp = df['date'].str.split('-', 1)
out = df[tmp.str[0].isin(['2015','2016']) & tmp.str[1].isin(['01-01','01-02'])]

If it's a datetime object column, then we could use the dt accessor to select certain years and month-days:

df['date'] = pd.to_datetime(df['date'])
out = df[df['date'].dt.year.isin([2015, 2016]) & df['date'].dt.strftime('%m-%d').isin(['01-01','01-02'])]

Output:

         date  price
0  2015-01-01     78
1  2015-01-02     87
3  2016-01-01     94
4  2016-01-02     55

CodePudding user response:

one option is to get the index as a MultiIndex of date objects; this allows for a relatively easy selection on multiple levels (in this case, year and day):

(df
.assign(year = df.date.dt.year, day = df.date.dt.day)
.set_index(['year', 'day'])
.loc(axis = 0)[2015:2016, :2]
)
               date  price
year day
2015 1   2015-01-01     78
     2   2015-01-02     87
2016 1   2016-01-01     94
     2   2016-01-02     55
  • Related