I know a bit about pandas.Period
and pandas.PeriodIndex
. But I am not able to make them fit to my use case.
I have a list of quarter strings in format YYYYQ
:
df = pandas.DataFrame({'quarter': ['20214', '20222']})
How can I create a PeriodIndex
or Period
(each element) of it?
I am not able to do this
pandas.PeriodIndex(df.quarter, freq='Q')
because my strings doesn't contain a q
or Q
at the right position. I could do some string manipulation in a first step to insert a q
at the right position. But I wonder if PeriodIndex
gives me the ability to specify a format string like YYYYQ
.
CodePudding user response:
You can rework the string to the 'YYYYQX' with str.replace
for automatic detection by pandas.to_datetime
, then convert to period:
(pd.to_datetime(df['quarter'].str.replace(r'(?=\d$)', 'Q', regex=True))
.dt.to_period('Q')
)
output:
0 2021Q4
1 2022Q2
Name: quarter, dtype: period[Q-DEC]
or for a PeriodIndex:
pd.PeriodIndex(pd.to_datetime(df['quarter'].str.replace(r'(?=\d$)',
'Q', regex=True)),
freq='Q')
output:
PeriodIndex(['2021Q4', '2022Q2'], dtype='period[Q-DEC]', name='quarter')
regex:
Here using a lookahead to insert a Q before the last digit:
(?=\d$) # match last digit
CodePudding user response:
Add q
before last digit by Series.str.replace
and then is possible converting to quarters by your solution:
df['quarter'] = pd.PeriodIndex(df['quarter'].str.replace(r'(\d{1})$', r'q\1', regex=True),
freq='Q')
print (df)
quarter
0 2021Q4
1 2022Q2
Details:
print (df['quarter'].str.replace(r'(\d{1})$', r'q\1', regex=True))
0 2021q4
1 2022q2
Name: quarter, dtype: object