Home > front end >  Analyzing dataframe with hourly data
Analyzing dataframe with hourly data

Time:11-09

I would like to analyze a dataframe with hourly data for several days, e.g. df:

      DATE      TIME  Threshold  Value
2022-11-04  02:00:00         10      9
2022-11-04  03:00:00         11     10  
2022-11-04  04:00:00         10     11  
2022-11-04  06:00:00         12     11  
2022-11-04  05:00:00         12     12  
2022-11-04  07:00:00         10     11  
2022-11-04  08:00:00         11     10  
2022-11-04  09:00:00         11      9  
2022-11-04  10:00:00         12      9  
2022-11-04  11:00:00         10     10  
2022-11-04  12:00:00         10     10
... 
2022-11-05  01:00:00         10      9
2022-11-05  02:00:00         11     10 
...

Now I would like to examine the data based on threshold/value and time. Let's say I am interested in the Value of time "08:00:00" if the threshold of the preceding time "04:00:00" was 10. To find possible patterns, I might also look at other combinations in the future.

My approach was:

  • Create a new dataframe df_2 with all slices of 04:00:00 and value = 10
  • Create a new dataframe df_3 with all slices of 08:00:00
  • merge df_2 and df_3 and select only rows where a time = 04:00:00 of the same day precedes a time = 8:00:00 entry.

This seems to be a bit cumbersome and I was wondering if there was a more practical way to do this. Maybe someone could suggest a more efficient way?

CodePudding user response:

at first make DatetimeInex:

date_idx=df.iloc[:, :2].astype('str').apply(lambda x: pd.to_datetime(' '.join(x)), axis=1)

and make new column that have Threshold before 4H

and make result to df1

df1 = (df.set_index(date_idx)
       .drop(['DATE', 'TIME'], axis=1)
       .sort_index()
       .assign(new=df1.shift(freq='4H')['Threshold']))

output(df1):

                    Threshold   Value   new
2022-11-04 02:00:00 10           9      NaN
2022-11-04 03:00:00 11          10      NaN
2022-11-04 04:00:00 10          11      NaN
2022-11-04 05:00:00 12          12      NaN
2022-11-04 06:00:00 12          11      10.0
2022-11-04 07:00:00 10          11      11.0
2022-11-04 08:00:00 11          10      10.0
2022-11-04 09:00:00 11           9      12.0
2022-11-04 10:00:00 12           9      12.0
2022-11-04 11:00:00 10          10      10.0
2022-11-04 12:00:00 10          10      11.0



filter data at 08:00:00:

df1.at_time('08:00')

output:

                    Threshold Value new
2022-11-04 08:00:00 11        10    10.0



check or filter Value and new column

CodePudding user response:

here is one way to do it

out=(df.loc[
    (df['TIME'].isin(['04:00:00','08:00:00']) &   # choose rows where time is 4:00 or 8:00
     df['DATE'].isin(                             # and date where 
         df.loc[df['TIME'].eq('04:00:00') &       # time is 04:00:00
                df['Threshold'].eq(10)]['DATE'])  # and Threshold is 10
)])
out
     DATE       TIME    Threshold   Value
2   2022-11-04  04:00:00    10      11
6   2022-11-04  08:00:00    11      10

Alternately, same as above just choose time eq to 08:00:00

out=(df.loc[
    (df['TIME'].isin(['08:00:00']) &
     df['DATE'].isin(
         df.loc[df['TIME'].eq('04:00:00') & 
                df['Threshold'].eq(10)]['DATE'])
)])
out
          DATE  TIME    Threshold   Value
6   2022-11-04  08:00:00    11      10
  • Related