id a b
01 1 2
01 3 4
02 5 6
03 7 8
I have a dataframe that I would like to add a date range to grouped by ID, the result should look like this
id a b date
01 1 2 31/03/2021
01 3 4 30/04/2021
02 5 6 31/03/2021
03 7 8 31/03/2021
but i actually just need the last day of each month for my calc like this
id a b day
01 1 2 31
01 3 4 30
02 5 6 31
03 7 8 31
CodePudding user response:
Use GroupBy.cumcount
for counter and add months by offsets.MonthOffset
with add to strating datetime, last use Series.dt.day
:
start = pd.to_datetime('2021-03-31')
df['day'] = (start df.groupby('id').cumcount()
.apply(lambda x: pd.offsets.DateOffset(months=x))).dt.day
print (df)
id a b day
0 1 1 2 31
1 1 3 4 30
2 2 5 6 31
3 3 7 8 31
Alternative is create moth periods in numpy with Series.dt.days_in_month
:
start = pd.to_datetime(['2021-03-31']).to_numpy().astype('datetime64[M]')
df['day'] = (start np.array(df.groupby('id').cumcount(), dtype='timedelta64[M]'))
df['day'] = df['day'].dt.days_in_month
print (df)
id a b day
0 1 1 2 31
1 1 3 4 30
2 2 5 6 31
3 3 7 8 31
CodePudding user response:
IIUC, you can apply a date_range
per group:
start = '2021-03-31'
df['date'] = (df
.groupby('id')
.apply(lambda g: pd.date_range(start=start, periods=len(g), freq='M').to_series())
.values)
output:
id a b date
0 1 1 2 2021-03-31
1 1 3 4 2021-04-30
2 2 5 6 2021-03-31
3 3 7 8 2021-03-31
or, for the day:
start = '2021-03-31'
df['date'] = (df
.groupby('id')
.apply(lambda g: pd.date_range(start=start, periods=len(g), freq='M').to_series())
.dt.day
.values
)
output:
id a b date
0 1 1 2 31
1 1 3 4 30
2 2 5 6 31
3 3 7 8 31