Am using pandas dataframe having columns ID ,Date ,Value. Groupby of ID Column ,Based on timestamp column I want to generate all dates in a new columns considering MONTH.
If any of the month has 30days for that in a new column 30 rows should be created in a dd-mm-yyyy format
If any of the month has 31days for that in a new column 31 rows should be created in a dd-mm-yyyy format
If any of the month has 28days for that in a new column 28 rows should be created in a dd-mm-yyyy format
Input Dataframe
Id timestamp Value
1002 18-01-2021 45
1003 08-03-2021 14
1004 03-07-2021 50
1005 18-06-2021 23
1005 12-09-2021 108
Output Dataframe
Id timestamp Value ext_dtes
1002 18-01-2021 45 01-01-2021
02-01-2021
03-01-2021
04-01-2021
05-01-2021
.
.
31-01-2021
1003 18-03-2021 14 01-03-2021
02-03-2021
03-03-2021
04-03-2021
05-03-2021
.
.
31-03-2021
1004 03-07-2021 50 01-07-2021
02-07-2021
03-07-2021
04-07-2021
05-07-2021
.
.
31-07-2021
1005 18-06-2021 23 01-06-2021
02-06-2021
03-06-2021
04-06-2021
05-06-2021
.
.
30-06-2021
1005 12-09-2021 45 01-09-2021
02-09-2021
03-09-2021
04-09-2021
05-09-2021
.
.
30-09-2021
I have tried this
date_format = [datetime.strptime(i, '%d-%m-%Y') for i in df['timestamp']]
num_days = [calendar.monthrange(i.year,i.month,i.date)[2] for i in date_format]
num_days
df['ext_dates']= df.groupby(pd.PeriodIndex(data= num_days, freq='D'))
import pandas as pd
#month = '2018-08-01'
df_a['Timestamp'] = pd.to_datetime(df_a['Timestamp'])
month =df_a['Timestamp']
month
df = pd.DataFrame({
'all_dates': pd.date_range(
start = pd.Timestamp(month),
end = pd.Timestamp(month) pd.offsets.MonthEnd(0), # <-- 2018-08-31 with MonthEnd
freq = 'D'
)
})
error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-98f03ab5c9bb> in <module>
7 df = pd.DataFrame({
8 'all_dates': pd.date_range(
----> 9 start = pd.Timestamp(month),
10 end = pd.Timestamp(month) pd.offsets.MonthEnd(0), # <-- 2018-08-31 with MonthEnd
11 freq = 'D'
pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()
TypeError: Cannot convert input [0 2022-01-01
1 2022-02-01
2 2022-03-01
Name: YearMonth, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp
CodePudding user response:
import pandas as pd
import calendar, datetime
df = pd.DataFrame({'Id':[1002,1003], 'timestamp':['18-01-2021','08-03-2021'],'Value':[45,14]})
df['timestamp'] = pd.to_datetime(df['timestamp'])
def givedays(f):
"""A function to return all the days in the month of a given date"""
year = f['timestamp'].year
month = f['timestamp'].month
num_days = calendar.monthrange(year, month)[1] #https://stackoverflow.com/questions/21231789/how-to-get-all-days-in-current-month
days = [datetime.date(year, month, day) for day in range(1, num_days 1)]
return days
df['days'] = df.apply(lambda x: givedays(x), axis=1)
df = df.explode(column='days')