Python3 Pandas how do I select a column with a specific date (2022-10-01) as the header?-CodePudding

Imported an excel spreadsheet that needs a lot of cleanup. The problem is that there are date fields mixed in with standard string headers. So I have a column with 'Name', 'Team', and then datetime dates. When I want to rename a date field I get the error message

KeyError: "['2022-10-01', '11/1/2022', '12/1/2022'] not in index"

I'm trying to select the specific column:

df = df[['2022-10-01', '11/1/2022', '12/1/2022']]

But since the key error I can't seem to rename it. I tried list(df.columns.values) and see that the result is: datetime.datetime(2022, 10, 1, 0, 0), datetime.datetime(2022, 11, 1, 0, 0), datetime.datetime(2022, 12, 1, 0, 0)

Is there a way I can convert these datetime headers into string format so I can rename them? I tried various date formats but there might be a pythonic way to handle these headers.

CodePudding user response：

You can convert the datetime columns to string using strftime:

df.columns = df.columns.strftime("%Y-%m-%d")
print(df)

Prints:

   2022-10-01  2022-11-01  2022-12-01
0           1           2           3
1           4           5           6

Then print(df.columns) will print:

Index(['2022-10-01', '2022-11-01', '2022-12-01'], dtype='object')

EDIT:

If you have many mixed type columns, you can do:

df = df.rename(lambda c: c if isinstance(c, str) else str(c).rsplit(maxsplit=1)[0], axis=1)

This will check if column type is string, and if not convert it to string, split it from the right side (.rsplit) once (maxsplit=1). This will convert the datetime column to only date (discarding the seconds).