Home > database >  Counting each day in a dataframe (Not resetting on new year)
Counting each day in a dataframe (Not resetting on new year)

Time:01-04

I have two years worth of data in a Dataframe called df, with an additional column called dayNo which labels what day it is in the year. See below:

enter image description here

Code which handles dayNo:

df['dayNo'] = pd.to_datetime(df['TradeDate'], dayfirst=True).dt.day_of_year

I would like to amened dayNo so that when 2023 begins, dayNo doesn't reset to 1, but changes to 366, 367 and so on. Expected output below:

enter image description here

Maybe a completely different approach will have to be taken to what I've done above. Any help greatly appreciated, Thanks!

CodePudding user response:

You could define a start day to start counting days from, and use the number of days from that point forward as your column. An example using self generated data to illustrate the point:

df = pd.DataFrame({"dates": pd.date_range("2022-12-29", "2023-01-03", freq="8H")})

start = pd.Timestamp("2021-12-31")
df["dayNo"] = df["dates"].sub(start).dt.days
                 dates  dayNo
0  2022-12-29 00:00:00    363
1  2022-12-29 08:00:00    363
2  2022-12-29 16:00:00    363
3  2022-12-30 00:00:00    364
4  2022-12-30 08:00:00    364
5  2022-12-30 16:00:00    364
6  2022-12-31 00:00:00    365
7  2022-12-31 08:00:00    365
8  2022-12-31 16:00:00    365
9  2023-01-01 00:00:00    366
10 2023-01-01 08:00:00    366
11 2023-01-01 16:00:00    366
12 2023-01-02 00:00:00    367
13 2023-01-02 08:00:00    367
14 2023-01-02 16:00:00    367
15 2023-01-03 00:00:00    368

CodePudding user response:

You are nearly there with your solution just do Apply for final result as

df['dayNo'] = df['dayNo'].apply(lambda x : x if x>= df.loc[0].dayNo else x df.loc[0].dayNo)
df
Out[108]: 
                 dates   TradeDate  dayNo
0  2022-12-31 00:00:00  2022-12-31    365
1  2022-12-31 01:00:00  2022-12-31    365
2  2022-12-31 02:00:00  2022-12-31    365
3  2022-12-31 03:00:00  2022-12-31    365
4  2022-12-31 04:00:00  2022-12-31    365
..                 ...         ...    ...
68 2023-01-02 20:00:00  2023-01-02    367
69 2023-01-02 21:00:00  2023-01-02    367
70 2023-01-02 22:00:00  2023-01-02    367
71 2023-01-02 23:00:00  2023-01-02    367
72 2023-01-03 00:00:00  2023-01-03    368

CodePudding user response:

Let's suppose we have a pandas dataframe as follows with this script (inspired by Chrysophylaxs dataframe) :

import pandas as pd

df = pd.DataFrame({'TradeDate': pd.date_range("2022-12-29", "2030-01-03", freq="8H")})

The dataframe has then dates from 2022 to 2030 :

               TradeDate
0    2022-12-29 00:00:00
1    2022-12-29 08:00:00
2    2022-12-29 16:00:00
3    2022-12-30 00:00:00
4    2022-12-30 08:00:00
...                  ...
7682 2030-01-01 16:00:00
7683 2030-01-02 00:00:00
7684 2030-01-02 08:00:00
7685 2030-01-02 16:00:00
7686 2030-01-03 00:00:00

[7687 rows x 1 columns]

I propose you the following commented-inside code to aim our target :

import pandas as pd

df = pd.DataFrame({'TradeDate': pd.date_range("2022-12-29", "2030-01-03", freq="8H")})

# Initialize Days counter
dyc = df['TradeDate'].iloc[0].dayofyear
# Initialize Previous day of Year
prv_dof = dyc 

def func(row):
    global dyc, prv_dof
    # Get the day of the year
    dof = row.iloc[0].dayofyear
    # If New day then increment days counter
    if dof != prv_dof:
        dyc =1
        prv_dof = dof
    return dyc

df['dayNo'] = df.apply(func, axis=1)

Resulting dataframe :

               TradeDate  dayNo
0    2022-12-29 00:00:00    363
1    2022-12-29 08:00:00    363
2    2022-12-29 16:00:00    363
3    2022-12-30 00:00:00    364
4    2022-12-30 08:00:00    364
...                  ...    ...
7682 2030-01-01 16:00:00   2923
7683 2030-01-02 00:00:00   2924
7684 2030-01-02 08:00:00   2924
7685 2030-01-02 16:00:00   2924
7686 2030-01-03 00:00:00   2925
  • Related