Rename all columns starting from a certain column with special character increasing number-CodePudding

Suppose my data frame df has the following column names: ['date', 'value', '20211010', '20211017', '20211024', ...]

I want to rename the column names of '20211010', '20211017', '20211024', ... (that is, all the columns starting from 20211010) to t1, t2, t3, ... continue to increase.

The expected new column names will be ['date', 'value', 't1', 't2', 't3', ...].

How to achieve this in Pandas? Thanks.

Reference:

how do i rename the columns from unnamed:0 to columns with increment number in python

CodePudding user response：

IIUC, a robust method could be to use pandas.to_datetime and pandas.factorize:

idx, _ = pd.factorize(pd.to_datetime(df.columns, format='%Y%m%d',
                                       errors='coerce'),
                        sort=True)
df.columns = ('t' pd.Series(idx 1, dtype='str')).mask(idx<0, df.columns)

Example output:

Index(['date', 'value', 't1', 't2', 't4', 'other', 't3'], dtype='object')

Input columns:

Index(['date', 'value', '20211010', '20211017', '20211024', 'other',
       '20211018'],
      dtype='object')

robustness

to_datetime ensures that valid dates are used, and sort=True in factorize enables to keep the dates sorted.

Example on this input:

['X', '20211010', '20229999', '20211018', '20211024', 'Y', '20211001']

The output would be:

['X', 't2', '20229999', 't3', 't4', 'Y', 't1']

The invalid date is ignored and the tn are in order.

CodePudding user response：

If always the first two columns will be skipped and just rename the rest a simple way would be

for i, col in enumerate(df.columns[2:]):
    df.rename(columns={col:f't{i 1}'}, inplace = True)

But this will not consider the name of any column, so if somehow you have a column in the middle that doesn't need to be renamed it will do it anyway

If you need to do it bulletproof I would go with the answer from mozway