I have a table like below:
Sector | Apr 2022 | May 2022 | Jun 2022 | 1Y Min | 1Y Max |
---|---|---|---|---|---|
A | 10 | 05 | 12 | 05 | 05 |
B | 18 | 20 | 09 | 09 | 09 |
C | 02 | 09 | 12 | 02 | 02 |
The last two column names are constant.
However the column names for the 4/22, 5/22, 6/22 columns change depending upon the latest month. These column names are in the x.strftime('%b %Y')
format.
I want to change the format of the dataframe[-3]
column name such that it is represented in the m/d/y format, i.e change the format of column name only for the latest month.
Wanted solution:
Sector | Apr 2022 | May 2022 | 6/30/2022 | 1Y Min | 1Y Max |
---|---|---|---|---|---|
A | 10 | 05 | 12 | 05 | 05 |
B | 18 | 20 | 09 | 09 | 09 |
C | 02 | 09 | 12 | 02 | 02 |
Piece of existing code to provide context:
df.columns = pd.Series(df.columns[:-2]).apply(lambda x: x.strftime('%b %Y')).to_list() ["1Y Min", "1Y Max"]
I want to execute the new piece of code after the above line.
I tried using df[df[-3]] = pd.to_datetime[df[df[-3]].apply(lambda x: x.strftime('%b %Y')
but this does not seem to work.
Is there a simple solution?
CodePudding user response:
Now you fixed 'June', you can use pd.to_datetime
and MonthEnd
to convert the month-year date to month/last_day/year format:
# Convert Jun 2022 to 6/30/2022
new_date = (pd.to_datetime(df.columns[-3], format='%b %Y') pd.offsets.MonthEnd(0)).strftime('%-m/%-d/%Y')
df.columns = [*df.columns[:-3], new_date, *df.columns[-2:]]
print(df)
# Output
Sector Apr 2022 May 2022 6/30/2022 1Y Min 1Y Max
0 A 10 5 12 5 5
1 B 18 20 9 9 9
2 C 2 9 12 2 2
CodePudding user response:
Here's another way to do it, without having to manually specify which column represents the latest month:
import pandas as pd
from dateutil.relativedelta import relativedelta
# NOTE: to install dateutil, execute the following command: `pip install python-dateutil`
df = pd.DataFrame(
[['A', '10', '05', '12', '05', '05'],
['B', '18', '20', '09', '09', '09'],
['C', '02', '09', '12', '02', '02']],
columns=['Sector', 'Apr 2022', 'May 2022', 'June 2022', '1Y Min', '1Y Max']
)
# 1. Convert the column names to datetime objects
dates = pd.to_datetime(df.columns, errors='coerce')
# DatetimeIndex(['NaT', '2022-04-01', '2022-05-01', '2022-06-01', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)
# 2. Find the last date
last_date = dates.dropna().max()
# 2022-06-01 00:00:00
# 3. Create a mapping of the original column names to the datetime objects
colmap = {date: original_col for date, original_col in zip(dates, df.columns)}
# {
# NaT: '1Y Max',
# Timestamp('2022-04-01 00:00:00'): 'Apr 2022',
# Timestamp('2022-05-01 00:00:00'): 'May 2022',
# Timestamp('2022-06-01 00:00:00'): 'June 2022'
# }
# 4. Find the column name of the last date
last_date_col = colmap[last_date]
# June 2022
# 5. Rename the column name of the last date to the last date of the month using
# format "m/d/Y", Where `m` represents the month, `d` represents the day,
# and `Y` represents the year
df.rename(columns={last_date_col: (last_date relativedelta(day=31)).strftime('%-m/%-d/%Y')})
# Output:
# -------- -- Changed column name
# | | (from 'June 2022' to '6/30/2022')
# v v
# Sector Apr 2022 May 2022 1Y Min 1Y Max 6/30/2022
# 0 A 10 05 05 05 12
# 1 B 18 20 09 09 09
# 2 C 02 09 02 02 12