Current df:
Date Power
2011-04-18 17:00:00 243.56
2011-04-18 17:00:01 245.83
2011-04-18 17:00:02 246.02
2011-04-18 17:00:03 245.72
2011-04-18 17:00:04 244.71
2011-04-18 17:00:05 245.93
2011-04-18 17:00:06 243.12
2011-04-18 17:00:07 244.72
2011-04-18 17:00:08 242.44
2011-04-18 17:00:09 246.42
2011-04-18 17:00:10 245.02
... ...
I have the df with Date and a float number. Date is the index and is unique. I would like to create a new df based on the dates found in the next df.
date start date end
0 2011-04-18 17:00:01 2011-04-18 17:00:02
1 2011-04-18 17:00:05 2011-04-18 17:00:06
2 2011-04-18 17:00:08 2011-04-18 17:00:10
... ... ...
I expect to get:
Date Power
2011-04-18 17:00:01 245.83
2011-04-18 17:00:02 246.02
2011-04-18 17:00:05 245.93
2011-04-18 17:00:06 243.12
2011-04-18 17:00:08 242.44
2011-04-18 17:00:09 246.42
2011-04-18 17:00:10 245.02
... ...
In other word I want to filter the initial df and find all rows between all the dates found in the second df.
I thought of using pandas.DataFrame.between_time. But the issue is this works only for 1 given date start and date end. How can I do this with many different date periods?
CodePudding user response:
Use np.logical_or.reduce
with list comprehension:
L = [df1['Date'].between(s, e) for s, e in df2[['date start','date end']].to_numpy()]
df = df1[np.logical_or.reduce(L)]
print (df)
Date Power
1 2011-04-18 17:00:01 245.83
2 2011-04-18 17:00:02 246.02
5 2011-04-18 17:00:05 245.93
6 2011-04-18 17:00:06 243.12
8 2011-04-18 17:00:08 242.44
9 2011-04-18 17:00:09 246.42
10 2011-04-18 17:00:10 245.02
If DatetimeIndex
is possible use:
L = [df1[s:e] for s, e in df2[['date start','date end']].to_numpy()]
df = pd.concat(L)
print (df)
Power
Date
2011-04-18 17:00:01 245.83
2011-04-18 17:00:02 246.02
2011-04-18 17:00:05 245.93
2011-04-18 17:00:06 243.12
2011-04-18 17:00:08 242.44
2011-04-18 17:00:09 246.42
2011-04-18 17:00:10 245.02
L = [(df1.index >= s) & (df1.index <= e)
for s, e in df2[['date start','date end']].to_numpy()]
df = df1[np.logical_or.reduce(L)]