Suppose my data frame df
has the following column names: ['date', 'value', '20211010', '20211017', '20211024', ...]
I want to rename the column names of '20211010', '20211017', '20211024', ...
(that is, all the columns starting from 20211010
) to t1, t2, t3, ...
continue to increase.
The expected new column names will be ['date', 'value', 't1', 't2', 't3', ...]
.
How to achieve this in Pandas? Thanks.
Reference:
how do i rename the columns from unnamed:0 to columns with increment number in python
CodePudding user response:
IIUC, a robust method could be to use pandas.to_datetime
and pandas.factorize
:
idx, _ = pd.factorize(pd.to_datetime(df.columns, format='%Y%m%d',
errors='coerce'),
sort=True)
df.columns = ('t' pd.Series(idx 1, dtype='str')).mask(idx<0, df.columns)
Example output:
Index(['date', 'value', 't1', 't2', 't4', 'other', 't3'], dtype='object')
Input columns:
Index(['date', 'value', '20211010', '20211017', '20211024', 'other',
'20211018'],
dtype='object')
robustness
to_datetime
ensures that valid dates are used, and sort=True
in factorize
enables to keep the dates sorted.
Example on this input:
['X', '20211010', '20229999', '20211018', '20211024', 'Y', '20211001']
The output would be:
['X', 't2', '20229999', 't3', 't4', 'Y', 't1']
The invalid date is ignored and the tn
are in order.
CodePudding user response:
If always the first two columns will be skipped and just rename the rest a simple way would be
for i, col in enumerate(df.columns[2:]):
df.rename(columns={col:f't{i 1}'}, inplace = True)
But this will not consider the name of any column, so if somehow you have a column in the middle that doesn't need to be renamed it will do it anyway
If you need to do it bulletproof I would go with the answer from mozway