Home > OS >  Pandas select dataframe rows between multiple date times
Pandas select dataframe rows between multiple date times

Time:10-29

Current df:

Date                 Power
2011-04-18 17:00:00  243.56
2011-04-18 17:00:01  245.83
2011-04-18 17:00:02  246.02
2011-04-18 17:00:03  245.72
2011-04-18 17:00:04  244.71
2011-04-18 17:00:05  245.93
2011-04-18 17:00:06  243.12
2011-04-18 17:00:07  244.72
2011-04-18 17:00:08  242.44
2011-04-18 17:00:09  246.42
2011-04-18 17:00:10  245.02
...                     ...

I have the df with Date and a float number. Date is the index and is unique. I would like to create a new df based on the dates found in the next df.

     date start          date end
0    2011-04-18 17:00:01 2011-04-18 17:00:02
1    2011-04-18 17:00:05 2011-04-18 17:00:06
2    2011-04-18 17:00:08 2011-04-18 17:00:10
...                  ...                 ...

I expect to get:

Date                 Power
2011-04-18 17:00:01  245.83
2011-04-18 17:00:02  246.02
2011-04-18 17:00:05  245.93
2011-04-18 17:00:06  243.12
2011-04-18 17:00:08  242.44
2011-04-18 17:00:09  246.42
2011-04-18 17:00:10  245.02
...                     ...

In other word I want to filter the initial df and find all rows between all the dates found in the second df.

I thought of using pandas.DataFrame.between_time. But the issue is this works only for 1 given date start and date end. How can I do this with many different date periods?

CodePudding user response:

Use np.logical_or.reduce with list comprehension:

L = [df1['Date'].between(s, e) for s, e in df2[['date start','date end']].to_numpy()]

df = df1[np.logical_or.reduce(L)]
print (df)
                  Date   Power
1  2011-04-18 17:00:01  245.83
2  2011-04-18 17:00:02  246.02
5  2011-04-18 17:00:05  245.93
6  2011-04-18 17:00:06  243.12
8  2011-04-18 17:00:08  242.44
9  2011-04-18 17:00:09  246.42
10 2011-04-18 17:00:10  245.02

If DatetimeIndex is possible use:

L = [df1[s:e] for s, e in df2[['date start','date end']].to_numpy()]

df = pd.concat(L)
print (df)
                      Power
Date                       
2011-04-18 17:00:01  245.83
2011-04-18 17:00:02  246.02
2011-04-18 17:00:05  245.93
2011-04-18 17:00:06  243.12
2011-04-18 17:00:08  242.44
2011-04-18 17:00:09  246.42
2011-04-18 17:00:10  245.02



L = [(df1.index >= s) & (df1.index <= e) 
     for s, e in df2[['date start','date end']].to_numpy()]

df = df1[np.logical_or.reduce(L)]
  • Related