Pandas Time Series: Count weekdays with min value in annual data-CodePudding

this is my first question on Stackoverflow and I hope I describe my problem detailed enough. I'm starting to learn data analysis with Pandas and I've created a time series with daily data for gas prices of a certain station. I've already grouped the hourly data into daily data.

I've been successfull with a simple scatter plot over the year with plotly but in the next step I would like to analyze which weekday is the cheapest or most expensive in every week, count the daynames and then look if there is a pattern over the whole year.

            count      mean       std    min    25%    50%     75%    max  \
2022-01-01   35.0  1.685000  0.029124  1.649  1.659  1.689  1.6990  1.749   
2022-01-02   27.0  1.673444  0.024547  1.649  1.649  1.669  1.6890  1.729   
2022-01-03   28.0  1.664000  0.040597  1.599  1.639  1.654  1.6890  1.789   
2022-01-04   31.0  1.635129  0.045069  1.599  1.599  1.619  1.6490  1.779   
2022-01-05   33.0  1.658697  0.048637  1.599  1.619  1.649  1.6990  1.769   
2022-01-06   35.0  1.658429  0.050756  1.599  1.619  1.639  1.6940  1.779   
2022-01-07   30.0  1.637333  0.039136  1.599  1.609  1.629  1.6565  1.759   
2022-01-08   41.0  1.655829  0.041740  1.619  1.619  1.639  1.6790  1.769   
2022-01-09   35.0  1.647857  0.031602  1.619  1.619  1.639  1.6590  1.769   
2022-01-10   31.0  1.634806  0.041374  1.599  1.609  1.619  1.6490  1.769 
...  

            week    weekday  
2022-01-01    52   Saturday  
2022-01-02    52     Sunday  
2022-01-03     1     Monday  
2022-01-04     1    Tuesday  
2022-01-05     1  Wednesday  
2022-01-06     1   Thursday  
2022-01-07     1     Friday  
2022-01-08     1   Saturday  
2022-01-09     1     Sunday  
2022-01-10     2     Monday
...

I tried with grouping and resampling but unfortunately I didn't get the result I was hoping for.

Can someone suggest a way how to deal with this problem? Thanks!

CodePudding user response：

Here's a way to do what I believe your question asks:

import pandas as pd
import numpy as np
df = pd.DataFrame({
'count':[35,27,28,31,33,35,30,41,35,31]*40,
'mean':
[1.685,1.673444,1.664,1.635129,1.658697,1.658429,1.637333,1.655829,1.647857,1.634806]*40
},
index=pd.Series(pd.to_datetime(pd.date_range("2022-01-01", periods=400, freq="D"))))
print( '','input df:',df,sep='\n' )

df_date = df.reset_index()['index']
df['weekday'] = list(df_date.dt.day_name())
df['year'] = df_date.dt.year.to_numpy()
df['week'] = df_date.dt.isocalendar().week.to_numpy()
df['year_week_started'] = df.year - np.where((df.week>=52)&(df.week.shift(-7)==1),1,0)
print( '','input df with intermediate columns:',df,sep='\n' )

cols = ['year_week_started', 'week']
dfCheap = df.loc[df.groupby(cols)['mean'].idxmin(),:].set_index(cols)
dfCheap = ( dfCheap.groupby(['year_week_started', 'weekday'])['mean'].count()
    .rename('freq').to_frame().set_index('freq', append=True)
    .reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfCheap:',dfCheap,sep='\n' )

dfExpensive = df.loc[df.groupby(cols)['mean'].idxmax(),:].set_index(cols)
dfExpensive = ( dfExpensive.groupby(['year_week_started', 'weekday'])['mean'].count()
    .rename('freq').to_frame().set_index('freq', append=True)
    .reset_index(level='weekday').sort_index(ascending=[True,False]) )
print( '','dfExpensive:',dfExpensive,sep='\n' )

Sample input:

input df:
            count      mean
2022-01-01     35  1.685000
2022-01-02     27  1.673444
2022-01-03     28  1.664000
2022-01-04     31  1.635129
2022-01-05     33  1.658697
...           ...       ...
2023-01-31     35  1.658429
2023-02-01     30  1.637333
2023-02-02     41  1.655829
2023-02-03     35  1.647857
2023-02-04     31  1.634806

[400 rows x 2 columns]

input df with intermediate columns:
            count      mean    weekday  year week  year_week_started
2022-01-01     35  1.685000   Saturday  2022   52               2021
2022-01-02     27  1.673444     Sunday  2022   52               2021
2022-01-03     28  1.664000     Monday  2022    1               2022
2022-01-04     31  1.635129    Tuesday  2022    1               2022
2022-01-05     33  1.658697  Wednesday  2022    1               2022
...           ...       ...        ...   ...  ...                ...
2023-01-31     35  1.658429    Tuesday  2023    5               2023
2023-02-01     30  1.637333  Wednesday  2023    5               2023
2023-02-02     41  1.655829   Thursday  2023    5               2023
2023-02-03     35  1.647857     Friday  2023    5               2023
2023-02-04     31  1.634806   Saturday  2023    5               2023

[400 rows x 6 columns]

Sample output:

dfCheap:
                          weekday
year_week_started freq
2021              1        Monday
2022              11      Tuesday
                  10     Thursday
                  10    Wednesday
                  6        Sunday
                  5        Friday
                  5        Monday
                  5      Saturday
2023              2      Thursday
                  1      Saturday
                  1        Sunday
                  1     Wednesday

dfExpensive:
                          weekday
year_week_started freq
2021              1      Saturday
2022              16       Monday
                  10      Tuesday
                  6        Sunday
                  5        Friday
                  5      Saturday
                  5      Thursday
                  5     Wednesday
2023              2        Monday
                  1        Friday
                  1      Thursday
                  1       Tuesday