i have a dataframe like this:
import pandas as pd
data=[[1,'2021-10-01 00:00:00','2021-12-01 23:59:59'],[2,'2022-02-01 00:00:00','2022-05-10 23:59:59']]
df=pd.DataFrame(data,columns=['id','start_date','end_date'])
df
id start_date end_date
1 2021-10-01 00:00:00 2021-12-01 23:59:59
2 2022-02-01 00:00:00 2022-05-10 23:59:59
I want to get the months between start and end dates in a new column.
expected output:
data2=[[1,'2021-10-01 00:00:00','2021-12-01 23:59:59','2021-10'],
[1,'2021-10-01 00:00:00','2021-12-01 23:59:59','2021-11'],
[1,'2021-10-01 00:00:00','2021-12-01 23:59:59','2021-12'],
[2,'2022-02-01 00:00:00','2022-05-10 23:59:59','2022-02'],
[2,'2022-02-01 00:00:00','2022-05-10 23:59:59','2022-03'],
[2,'2022-02-01 00:00:00','2022-05-10 23:59:59','2022-04'],
[2,'2022-02-01 00:00:00','2022-05-10 23:59:59','2022-05']]
df2=pd.DataFrame(data2,columns=['id','start_date','end_date','months'])
df2
id start_date end_date months
1 2021-10-01 00:00:00 2021-12-01 23:59:59 2021-10
1 2021-10-01 00:00:00 2021-12-01 23:59:59 2021-11
1 2021-10-01 00:00:00 2021-12-01 23:59:59 2021-12
2 2022-02-01 00:00:00 2022-05-10 23:59:59 2022-02
2 2022-02-01 00:00:00 2022-05-10 23:59:59 2022-03
2 2022-02-01 00:00:00 2022-05-10 23:59:59 2022-04
2 2022-02-01 00:00:00 2022-05-10 23:59:59 2022-05
is there a way to do this?
CodePudding user response:
Create months periods by period_range
and then use DataFrame.explode
:
df['months'] = [pd.period_range(s, e, freq='m')
for s, e in zip(df['start_date'], df['end_date'])]
#alternative
#lambda x: pd.period_range(x['start_date'], x['end_date'], freq='m')
#df['months'] = df.apply(f, axis=1)
df = df.explode('months').reset_index(drop=True)
print (df)
id start_date end_date months
0 1 2021-10-01 00:00:00 2021-12-01 23:59:59 2021-10
1 1 2021-10-01 00:00:00 2021-12-01 23:59:59 2021-11
2 1 2021-10-01 00:00:00 2021-12-01 23:59:59 2021-12
3 2 2022-02-01 00:00:00 2022-05-10 23:59:59 2022-02
4 2 2022-02-01 00:00:00 2022-05-10 23:59:59 2022-03
5 2 2022-02-01 00:00:00 2022-05-10 23:59:59 2022-04
6 2 2022-02-01 00:00:00 2022-05-10 23:59:59 2022-05