Home > Blockchain >  Split period into two dates when the date has the same delimiter
Split period into two dates when the date has the same delimiter

Time:10-07

Goal: derive period start and period end from the column period, in the form of dd.mm.yyyy - dd.mm.yyyy

period

28-02-2022 - 30.09.2022    
31.01.2022 - 31.12.2022
28.02.2019 - 30-04-2020
20.01.2019-22.02.2020
19.03.2020- 24.05.2021
13.09.2022-12-10.2022

df[['period_start,'period_end]]= df['period'].str.split('-',expand=True) 

will not work.

Expected output

period_start    period_end
31.02.2022      30.09.2022    
31.01.2022      31.12.2022
28.02.2019      30.04.2020
20.01.2019      22.02.2020
19.03.2020      24.05.2021
13.09.2022      12.10.2022

CodePudding user response:

We can use str.extract here for one option:

df[["period_start", "period_end"]] = df["period"].str.extract(r'(\S )\s*-\s*(\S )')
                                                 .str.replace(r'-', '.')

CodePudding user response:

Use a regex to split on the dash with surrounding spaces:

out = (df['period'].str.split(r'\s -\s ',expand=True)
         .set_axis(['period_start', 'period_end'], axis=1)
       )

or to remove the column and create new ones:

df[['period_start', 'period_end']] = df.pop('period').str.split(r'\s -\s ',expand=True) 

output:

  period_start  period_end
0   31-02-2022  30.09.2022
1   31.01.2022  31.12.2022
2   28.02.2019  30-04-2020

CodePudding user response:

the problem is you were trying to split on dash, and there's many dashes in the one row, this work :

df[['period_start','period_end']]= df['period'].str.split(' - ',expand=True) 

because we split on space dash

  • Related