I have monthly data. When I apply resampling by day and cubic interpolation, there is a function to find the month end data
import pandas as pd
df=pd.read_excel(input_file, sheet_name='mthly', usecols='A:D', na_values='ND', index_col=0, header=0)
df.index.names = ['Period']
df.index = pd.to_datetime(df.index)
q= pd.Series(df[series], index=df.index)
d = q.resample('D').interpolate(method='cubic')
m=d[d.index.is_month_end]
Now I want the data for the start of each week, but I can't find such a function. How do I solve this?
INPUT MONTHLY DATA
Period Values
Jan-22 1
Feb-22 3
Mar-22 7
Apr-22 5
May-22 4
After re-sampling by day and with cubic interpolation
Period
2022-01-01 1.000000
2022-01-02 0.884487
2022-01-03 0.785650
2022-01-04 0.703004
2022-01-05 0.636066
2022-01-06 0.584353
2022-01-07 0.547382
2022-01-08 0.524668
2022-01-09 0.515729
2022-01-10 0.520081
2022-01-11 0.537240
2022-01-12 0.566724
2022-01-13 0.608048
and so on
Desired output is Monday of each week
2022-01-03 0.785650
2022-01-10 0.520081
and so on
Thank you
CodePudding user response:
Example
data = {'Period': {0: 'Jan-22', 1: 'Feb-22', 2: 'Mar-22', 3: 'Apr-22', 4: 'May-22'},
'Values': {0: 1, 1: 3, 2: 7, 3: 5, 4: 4}}
df = pd.DataFrame(data)
df
Period Values
0 2022-01-01 1
1 2022-02-01 3
2 2022-03-01 7
3 2022-04-01 5
4 2022-05-01 4
Code
use asfreq
for upsampling
df['Period'] = pd.to_datetime(df['Period'], format='%b-%y') # change to datetime
out = (df.set_index('Period').asfreq(freq='D')
.interpolate(method='cubic')[lambda x: x.index.weekday == 0])
out.head(5)
Values
Period
2022-01-03 0.785650
2022-01-10 0.520081
2022-01-17 0.882085
2022-01-24 1.705848
2022-01-31 2.825552
note
.weekday
return weekday by number(mon = 0, sum =6) in DatetimeIndex
example
idx1 = pd.DatetimeIndex(['2022-01-03', '2022-01-04'])
idx1
DatetimeIndex(['2022-01-03', '2022-01-04'], dtype='datetime64[ns]', freq=None)
idx1.weekday
Int64Index([0, 1], dtype='int64')
when applied to datetime series instead datetimeindex, The dt accessor is required (ex: dt.weekday
)