Home > Mobile >  Create a pandas.PeriodIndex from a list of quarter strings in format YYYYQ
Create a pandas.PeriodIndex from a list of quarter strings in format YYYYQ

Time:05-21

I know a bit about pandas.Period and pandas.PeriodIndex. But I am not able to make them fit to my use case.

I have a list of quarter strings in format YYYYQ:

df = pandas.DataFrame({'quarter': ['20214', '20222']})

How can I create a PeriodIndex or Period (each element) of it?

I am not able to do this

pandas.PeriodIndex(df.quarter, freq='Q')

because my strings doesn't contain a q or Q at the right position. I could do some string manipulation in a first step to insert a q at the right position. But I wonder if PeriodIndex gives me the ability to specify a format string like YYYYQ.

CodePudding user response:

You can rework the string to the 'YYYYQX' with str.replace for automatic detection by pandas.to_datetime, then convert to period:

(pd.to_datetime(df['quarter'].str.replace(r'(?=\d$)', 'Q', regex=True))
   .dt.to_period('Q')
)

output:

0    2021Q4
1    2022Q2
Name: quarter, dtype: period[Q-DEC]

or for a PeriodIndex:

pd.PeriodIndex(pd.to_datetime(df['quarter'].str.replace(r'(?=\d$)',
                                                        'Q', regex=True)),
               freq='Q')

output:

PeriodIndex(['2021Q4', '2022Q2'], dtype='period[Q-DEC]', name='quarter')

regex:

Here using a lookahead to insert a Q before the last digit:

(?=\d$)    # match last digit

regex demo

CodePudding user response:

Add q before last digit by Series.str.replace and then is possible converting to quarters by your solution:

df['quarter'] = pd.PeriodIndex(df['quarter'].str.replace(r'(\d{1})$', r'q\1', regex=True), 
                               freq='Q')
print (df)
  quarter
0  2021Q4
1  2022Q2

Details:

print (df['quarter'].str.replace(r'(\d{1})$', r'q\1', regex=True))
0    2021q4
1    2022q2
Name: quarter, dtype: object
  • Related