Home > front end >  Using Start Date and End Date create a a new column : Pandas
Using Start Date and End Date create a a new column : Pandas

Time:05-23

I have a pandas dataframe shown below (sample) and I want to create a new table with an extra column 'NewDate' which will look at StartDate and show the last date of the month for start date and subsequently last date of every month till the end date for each ID and if my ID has End Date as Null the series will stop at the last date of current month which is May 2022.

ID     StartDate      EndDate            
100     1/01/2022     26/04/2022    
101    20/04/2022     Null
102     1/01/2022     27/02/2022          
....

My Expected Output:

ID  StartDate       EndDate          NewDate
100     1/01/2022    26/04/2022      31/01/2022
100     1/01/2022    26/04/2022      28/02/2022
100     1/01/2022    26/04/2022      31/03/2022
100     1/01/2022    26/04/2022      30/04/2022
101     20/04/2022    Null           30/04/2022
101     20/04/2022    Null           31/05/2022
102     1/01/2022    27/02/2022      31/01/2022
102     1/01/2022    27/02/2022      28/02/2022
...

CodePudding user response:

You need trim your date first , then we create the date range with pd.date_range and explode the column

s1 = pd.to_datetime(df.StartDate, format = '%d/%m/%Y')
s2 = pd.to_datetime(df.EndDate, format = '%d/%m/%Y', errors = 'coerce')   pd.offsets.MonthEnd(0)
     
s2 = s2.fillna(s1   pd.offsets.MonthEnd(2))

df['new'] = [pd.date_range(x, y , freq= 'M',closed = 'left') for x , y in zip(df.StartDate, s pd.offsets.MonthEnd(1))]
out = df.explode('new')

out
Out[206]: 
    ID   StartDate     EndDate        new
0  100   1/01/2022  26/04/2022 2022-01-31
0  100   1/01/2022  26/04/2022 2022-02-28
0  100   1/01/2022  26/04/2022 2022-03-31
0  100   1/01/2022  26/04/2022 2022-04-30
1  101  20/04/2022        Null 2022-04-30
1  101  20/04/2022        Null 2022-05-31
2  102   1/01/2022  27/02/2022 2022-01-31
2  102   1/01/2022  27/02/2022 2022-02-28

Updated

s2 = s2.fillna(s1   pd.offsets.MonthEnd(1))  
df['new_date'] = [pd.date_range(x, y , freq= 'M',closed = 'left') for x , y in zip(df.start_date, s2 pd.offsets.MonthEnd(1))] 
output = df.explode('new_date') 

CodePudding user response:

Try this

# convert each date column to datetime
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# create date ranges for each row
f = lambda s,e: pd.date_range(s, e pd.DateOffset(months=1), freq='M')
df['NewDate'] = [f(s,e) if e==e else f(s,s) for s, e in zip(df['StartDate'], df['EndDate'])]
# explode the new column
df = df.explode('NewDate')
print(df)
    ID  StartDate    EndDate    NewDate
0  100 2022-01-01 2022-04-26 2022-01-31
0  100 2022-01-01 2022-04-26 2022-02-28
0  100 2022-01-01 2022-04-26 2022-03-31
0  100 2022-01-01 2022-04-26 2022-04-30
1  101 2022-04-20        NaT 2022-04-30
2  102 2022-01-01 2022-02-27 2022-01-31
2  102 2022-01-01 2022-02-27 2022-02-28
  • Related