Changing column name from string to datetime-CodePudding

I have a table like below:

Sector	Apr 2022	May 2022	Jun 2022	1Y Min	1Y Max
A	10	05	12	05	05
B	18	20	09	09	09
C	02	09	12	02	02

The last two column names are constant. However the column names for the 4/22, 5/22, 6/22 columns change depending upon the latest month. These column names are in the x.strftime('%b %Y') format.

I want to change the format of the dataframe[-3] column name such that it is represented in the m/d/y format, i.e change the format of column name only for the latest month.

Wanted solution:

Sector	Apr 2022	May 2022	6/30/2022	1Y Min	1Y Max
A	10	05	12	05	05
B	18	20	09	09	09
C	02	09	12	02	02

Piece of existing code to provide context:

df.columns = pd.Series(df.columns[:-2]).apply(lambda x: x.strftime('%b %Y')).to_list()  ["1Y Min", "1Y Max"]

I want to execute the new piece of code after the above line.

I tried using df[df[-3]] = pd.to_datetime[df[df[-3]].apply(lambda x: x.strftime('%b %Y') but this does not seem to work. Is there a simple solution?

CodePudding user response：

Now you fixed 'June', you can use pd.to_datetime and MonthEnd to convert the month-year date to month/last_day/year format:

# Convert Jun 2022 to 6/30/2022
new_date = (pd.to_datetime(df.columns[-3], format='%b %Y')   pd.offsets.MonthEnd(0)).strftime('%-m/%-d/%Y')

df.columns = [*df.columns[:-3], new_date, *df.columns[-2:]]
print(df)

# Output
  Sector  Apr 2022  May 2022  6/30/2022  1Y Min  1Y Max
0      A        10         5         12       5       5
1      B        18        20          9       9       9
2      C         2         9         12       2       2

CodePudding user response：

Here's another way to do it, without having to manually specify which column represents the latest month:

import pandas as pd
from dateutil.relativedelta import relativedelta
# NOTE: to install dateutil, execute the following command: `pip install python-dateutil`

df = pd.DataFrame(
    [['A', '10', '05', '12', '05', '05'],
    ['B', '18', '20', '09', '09', '09'],
    ['C', '02', '09', '12', '02', '02']],
    columns=['Sector', 'Apr 2022', 'May 2022', 'June 2022', '1Y Min', '1Y Max']
)

# 1. Convert the column names to datetime objects
dates = pd.to_datetime(df.columns, errors='coerce')
# DatetimeIndex(['NaT', '2022-04-01', '2022-05-01', '2022-06-01', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

# 2. Find the last date
last_date = dates.dropna().max()
# 2022-06-01 00:00:00

# 3. Create a mapping of the original column names to the datetime objects
colmap = {date: original_col for date, original_col in zip(dates, df.columns)}
# {
#     NaT: '1Y Max',
#     Timestamp('2022-04-01 00:00:00'): 'Apr 2022', 
#     Timestamp('2022-05-01 00:00:00'): 'May 2022',
#     Timestamp('2022-06-01 00:00:00'): 'June 2022'
# }

# 4. Find the column name of the last date
last_date_col = colmap[last_date]
# June 2022

# 5. Rename the column name of the last date to the last date of the month using
#    format "m/d/Y", Where `m` represents the month, `d` represents the day,
#    and `Y` represents the year
df.rename(columns={last_date_col: (last_date   relativedelta(day=31)).strftime('%-m/%-d/%Y')})
# Output:
#                                             -------- -- Changed column name
#                                            |        |   (from 'June 2022' to '6/30/2022')
#                                            v        v  
#   Sector Apr 2022 May 2022  1Y Min  1Y Max 6/30/2022
# 0      A       10       05      05      05        12
# 1      B       18       20      09      09        09
# 2      C       02       09      02      02        12