I have the following table for the whole year 2020:
and I would like to fit this into a dataframe time series to firstly plot and then forecast for the year 2021 using ML.
My problem is implementing the Calender weeks by means of conversion. Or maybe it doesn't have to be converted and I'm thinking too complicated.
Normally, I would use
df.index = pd.to_datetime(df[['year', 'month', 'day']])
but how do I do this using calender weeks?
CodePudding user response:
Calendar weeks can be parsed with pd.to_datetime
using %U
(https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes).
But with pd.to_datetime
you will also need to provide an information about the day of week (parsed with %w
), otherwise parsing will fail.
The code could look like:
import pandas as pd
from io import StringIO
data = """Year,Calendar Week,Month,Amount
2020,1,12,129.984
2020,2,1,263.136
2020,3,1,592.896
2020,4,1,470.016
2020,5,1,429.888
"""
df = pd.read_csv(StringIO(data), sep=",")
df['date'] = pd.to_datetime(
df[['Year', 'Calendar Week']].astype(str).apply(' '.join, 1) # create temp column with year and week info
' 0', # add day info
format='%Y %U %w'
)
print(df)
Year Calendar Week Month Amount date
0 2020 1 12 129.984 2020-01-05
1 2020 2 1 263.136 2020-01-12
2 2020 3 1 592.896 2020-01-19
3 2020 4 1 470.016 2020-01-26
4 2020 5 1 429.888 2020-02-02
CodePudding user response:
You can try:
df_2020 = pd.DataFrame(data={'date':pd.date_range(datetime(2020,1,1), datetime(2020, 12, 31), freq='W')})
df_2020['week'] = df_2020['date'].dt.isocalendar().week
your_df = your_df.merge(df_2020, how='left', left_on='Calendar Week', right_on='week')