Home > Software design >  How to filter by day the last 3 days values of a pandas dataframe considering the hours?
How to filter by day the last 3 days values of a pandas dataframe considering the hours?

Time:03-10

I have this dataframe:

my_df

I need to get the values of the single days between time 05:00:00 and 06:00:00 (so, in this example, ignore 07:00:00) And create a separate dataframe for each day considering the last 3 days.

This is the result i want to achive: (3 dataframes considering 3 days and Time between 05 and 06)

desidered result

I tried this: (without success)

df.sort_values(by = "Time", inplace=True)
df_of_yesterday = df[ (df.Time.dt.hour > 4) 
          &  (df.Time.dt.hour < 7)]

CodePudding user response:

You can use:

from datetime import date, time, timedelta

today = date.today()
m = df['Time'].dt.time.between(time(5), time(6))

df_yda = df.loc[m & (df['Time'].dt.date == today - timedelta(1))]
df_2da = df.loc[m & (df['Time'].dt.date == today - timedelta(2))]
df_3da = df.loc[m & (df['Time'].dt.date == today - timedelta(3))]

Output:

>>> df_yda
                  Time      Open
77 2022-03-09 05:00:00  0.880443
78 2022-03-09 06:00:00  0.401932

>> df_2da
                  Time      Open
53 2022-03-08 05:00:00  0.781377
54 2022-03-08 06:00:00  0.638676

>>> df_3da
                  Time      Open
29 2022-03-07 05:00:00  0.838719
30 2022-03-07 06:00:00  0.897211

Setup a MRE:

import pandas as pd
import numpy as np

rng = np.random.default_rng()
dti = pd.date_range('2022-03-06', '2022-03-10', freq='H')
df = pd.DataFrame({'Time': dti, 'Open': rng.random(len(dti))})

CodePudding user response:

Use Series.between with set offsets.DateOffset for datetimes between this times in list comprehension for list of DataFrames:

now = pd.to_datetime('now').normalize()

dfs = [df[df.Time.between(now - pd.DateOffset(days=i, hour=5), 
                          now - pd.DateOffset(days=i, hour=6))] for i in range(1,4)]

print (dfs[0])
print (dfs[1])
print (dfs[2])

CodePudding user response:

I've manually copied your data into a dictionary and then converted it to your desired output.

First you should probably edit your question to use the text version of the data instead of an image, here's a small example:

data = {
    'Time': [
        '2022-03-06 05:00:00',
        '2022-03-06 06:00:00',
        '2022-03-06 07:00:00',
        
        '2022-03-07 05:00:00',
        '2022-03-07 06:00:00',
        '2022-03-07 07:00:00',
        
        '2022-03-08 05:00:00',
        '2022-03-08 06:00:00',
        '2022-03-08 07:00:00',
        
        '2022-03-09 05:00:00',
        '2022-03-09 06:00:00',
        '2022-03-09 07:00:00'
    ],
    'Open': [
        '13823.6',
        '13786.6',
        '13823.6',
        
        '13823.6',
        '13786.6',
        '13823.6',
        
        '13823.6',
        '13786.6',
        '13823.6',
        
        '13823.6',
        '13786.6',
        '13823.6'
    ]
}

df = pd.DataFrame(data)

Then you can use this code to get all the dates that are on the same day and in between the hours 4 and 7 and then create your dataframes as follows:

import pandas as pd  
from datetime import datetime

dict = {}
for index, row in df.iterrows():
    found = False
    for item in dict:
        date = datetime.strptime(row['Time'], '%Y-%m-%d %H:%M:%S')
        date2 = datetime.strptime(item, '%Y-%m-%d')
        if(date.date() == date2.date() and date.hour > 4 and date.hour < 7):
            dict[item].append(row['Open'])
            found = True
    date = datetime.strptime(row['Time'], '%Y-%m-%d %H:%M:%S')
    if(not found and date.hour > 4 and date.hour < 7):
        dict[date.strftime('%Y-%m-%d')] = []
        dict[date.strftime('%Y-%m-%d')].append(row['Open'])
for key in dict:
    temp = {
        key: dict[key]
    }
    df = pd.DataFrame(temp)
    print(df)
  • Related