I would like to analyze a dataframe with hourly data for several days, e.g. df:
DATE TIME Threshold Value
2022-11-04 02:00:00 10 9
2022-11-04 03:00:00 11 10
2022-11-04 04:00:00 10 11
2022-11-04 06:00:00 12 11
2022-11-04 05:00:00 12 12
2022-11-04 07:00:00 10 11
2022-11-04 08:00:00 11 10
2022-11-04 09:00:00 11 9
2022-11-04 10:00:00 12 9
2022-11-04 11:00:00 10 10
2022-11-04 12:00:00 10 10
...
2022-11-05 01:00:00 10 9
2022-11-05 02:00:00 11 10
...
Now I would like to examine the data based on threshold/value and time. Let's say I am interested in the Value of time "08:00:00" if the threshold of the preceding time "04:00:00" was 10. To find possible patterns, I might also look at other combinations in the future.
My approach was:
- Create a new dataframe df_2 with all slices of 04:00:00 and value = 10
- Create a new dataframe df_3 with all slices of 08:00:00
- merge df_2 and df_3 and select only rows where a time = 04:00:00 of the same day precedes a time = 8:00:00 entry.
This seems to be a bit cumbersome and I was wondering if there was a more practical way to do this. Maybe someone could suggest a more efficient way?
CodePudding user response:
at first make DatetimeInex:
date_idx=df.iloc[:, :2].astype('str').apply(lambda x: pd.to_datetime(' '.join(x)), axis=1)
and make new
column that have Threshold before 4H
and make result to df1
df1 = (df.set_index(date_idx)
.drop(['DATE', 'TIME'], axis=1)
.sort_index()
.assign(new=df1.shift(freq='4H')['Threshold']))
output(df1
):
Threshold Value new
2022-11-04 02:00:00 10 9 NaN
2022-11-04 03:00:00 11 10 NaN
2022-11-04 04:00:00 10 11 NaN
2022-11-04 05:00:00 12 12 NaN
2022-11-04 06:00:00 12 11 10.0
2022-11-04 07:00:00 10 11 11.0
2022-11-04 08:00:00 11 10 10.0
2022-11-04 09:00:00 11 9 12.0
2022-11-04 10:00:00 12 9 12.0
2022-11-04 11:00:00 10 10 10.0
2022-11-04 12:00:00 10 10 11.0
filter data at 08:00:00:
df1.at_time('08:00')
output:
Threshold Value new
2022-11-04 08:00:00 11 10 10.0
check or filter Value
and new
column
CodePudding user response:
here is one way to do it
out=(df.loc[
(df['TIME'].isin(['04:00:00','08:00:00']) & # choose rows where time is 4:00 or 8:00
df['DATE'].isin( # and date where
df.loc[df['TIME'].eq('04:00:00') & # time is 04:00:00
df['Threshold'].eq(10)]['DATE']) # and Threshold is 10
)])
out
DATE TIME Threshold Value
2 2022-11-04 04:00:00 10 11
6 2022-11-04 08:00:00 11 10
Alternately, same as above just choose time eq to 08:00:00
out=(df.loc[
(df['TIME'].isin(['08:00:00']) &
df['DATE'].isin(
df.loc[df['TIME'].eq('04:00:00') &
df['Threshold'].eq(10)]['DATE'])
)])
out
DATE TIME Threshold Value
6 2022-11-04 08:00:00 11 10