I have a pandas data frame with the following structure:
timestamp,y
2019-08-01 00:00:00,872.0
2019-08-01 00:15:00,668.0
2019-08-01 00:30:00,604.0
2019-08-01 00:45:00,788.0
2019-08-01 01:00:00,608.0
2019-08-01 01:15:00,692.0
2019-08-01 01:30:00,716.0
2019-08-01 01:45:00,692.0
2019-08-01 02:00:00,672.0
2019-08-01 02:15:00,636.0
2019-08-01 02:30:00,596.0
2019-08-01 02:45:00,748.0
...
What I would like to do is for every time period in this dataframe which is between 6 PM and 5 AM, I would like to know the timestamp when the y
value drops below a certain threshold.
I was thinking of doing the following pseudocode:
timestamps = list()
for _, row in df.iterrows():
found = False
current = row['timestamp']
val = row['y']
if current is between 6 PM and 5 AM:
if not found and value < threshold:
found = True
timestamps.append(current)
But this seems quite ugly and prone to errors and I wonder if there is a more succint pandaish way to do this?
CodePudding user response:
Set up to use df.between_time
, by making it a DatetimeIndex and add whatever optional filters you'd like:
df.timestamp = pd.to_datetime(df.timestamp)
df = df.set_index('timestamp')
threshold = 700
out = df.between_time('01:00', '02:00')[lambda x: x.y < threshold]
print(out)
Output:
y
timestamp
2019-08-01 01:00:00 608.0
2019-08-01 01:15:00 692.0
2019-08-01 01:45:00 692.0
2019-08-01 02:00:00 672.0
Then you can resample, and take the first line from each. This would be the timestamp where it first dropped below the threshold. Here it is daily, I'm not sure how you'd do 6pm to 5am, but I'm sure it's possible.
out.resample('d').first()
# Output:
y
timestamp
2019-08-01 608.0
df.timestamp = pd.to_datetime(df.timestamp)
df = (df.set_index('timestamp')
.between_time('18:00', '05:00')
[lambda x: x.y < threshold]
.resample('d')
.first())