Goal: derive period start and period end from the column period, in the form of dd.mm.yyyy - dd.mm.yyyy
period
28-02-2022 - 30.09.2022
31.01.2022 - 31.12.2022
28.02.2019 - 30-04-2020
20.01.2019-22.02.2020
19.03.2020- 24.05.2021
13.09.2022-12-10.2022
df[['period_start,'period_end]]= df['period'].str.split('-',expand=True)
will not work.
Expected output
period_start period_end
31.02.2022 30.09.2022
31.01.2022 31.12.2022
28.02.2019 30.04.2020
20.01.2019 22.02.2020
19.03.2020 24.05.2021
13.09.2022 12.10.2022
CodePudding user response:
We can use str.extract
here for one option:
df[["period_start", "period_end"]] = df["period"].str.extract(r'(\S )\s*-\s*(\S )')
.str.replace(r'-', '.')
CodePudding user response:
Use a regex to split on the dash with surrounding spaces:
out = (df['period'].str.split(r'\s -\s ',expand=True)
.set_axis(['period_start', 'period_end'], axis=1)
)
or to remove the column and create new ones:
df[['period_start', 'period_end']] = df.pop('period').str.split(r'\s -\s ',expand=True)
output:
period_start period_end
0 31-02-2022 30.09.2022
1 31.01.2022 31.12.2022
2 28.02.2019 30-04-2020
CodePudding user response:
the problem is you were trying to split on dash, and there's many dashes in the one row, this work :
df[['period_start','period_end']]= df['period'].str.split(' - ',expand=True)
because we split on space dash