Home > database >  Python resample by day & get weekstart data
Python resample by day & get weekstart data

Time:12-15

I have monthly data. When I apply resampling by day and cubic interpolation, there is a function to find the month end data

import pandas as pd
df=pd.read_excel(input_file, sheet_name='mthly', usecols='A:D', na_values='ND', index_col=0, header=0)
df.index.names = ['Period']
df.index = pd.to_datetime(df.index)

q= pd.Series(df[series], index=df.index)
d = q.resample('D').interpolate(method='cubic')  
m=d[d.index.is_month_end]

Now I want the data for the start of each week, but I can't find such a function. How do I solve this?

INPUT MONTHLY DATA


Period  Values
Jan-22  1
Feb-22  3
Mar-22  7
Apr-22  5
May-22  4



After re-sampling by day and with cubic interpolation

Period
2022-01-01    1.000000
2022-01-02    0.884487
2022-01-03    0.785650
2022-01-04    0.703004
2022-01-05    0.636066
2022-01-06    0.584353
2022-01-07    0.547382
2022-01-08    0.524668
2022-01-09    0.515729
2022-01-10    0.520081
2022-01-11    0.537240
2022-01-12    0.566724
2022-01-13    0.608048
and so on

Desired output is Monday of each week

2022-01-03    0.785650
2022-01-10    0.520081
and so on

Thank you

CodePudding user response:

Example

data = {'Period': {0: 'Jan-22', 1: 'Feb-22', 2: 'Mar-22', 3: 'Apr-22', 4: 'May-22'},
        'Values': {0: 1, 1: 3, 2: 7, 3: 5, 4: 4}}
df = pd.DataFrame(data)

df

    Period      Values
0   2022-01-01  1
1   2022-02-01  3
2   2022-03-01  7
3   2022-04-01  5
4   2022-05-01  4

Code

use asfreq for upsampling

df['Period'] = pd.to_datetime(df['Period'], format='%b-%y') # change to datetime
out = (df.set_index('Period').asfreq(freq='D')
       .interpolate(method='cubic')[lambda x: x.index.weekday == 0])

out.head(5)

            Values
Period  
2022-01-03  0.785650
2022-01-10  0.520081
2022-01-17  0.882085
2022-01-24  1.705848
2022-01-31  2.825552

note

pandas.DatetimeIndex.weekday

.weekday return weekday by number(mon = 0, sum =6) in DatetimeIndex

example

idx1 = pd.DatetimeIndex(['2022-01-03', '2022-01-04'])

idx1

DatetimeIndex(['2022-01-03', '2022-01-04'], dtype='datetime64[ns]', freq=None)

idx1.weekday

Int64Index([0, 1], dtype='int64')

when applied to datetime series instead datetimeindex, The dt accessor is required (ex: dt.weekday)

  • Related