Imported an excel spreadsheet that needs a lot of cleanup. The problem is that there are date fields mixed in with standard string headers. So I have a column with 'Name', 'Team', and then datetime dates. When I want to rename a date field I get the error message
KeyError: "['2022-10-01', '11/1/2022', '12/1/2022'] not in index"
I'm trying to select the specific column:
df = df[['2022-10-01', '11/1/2022', '12/1/2022']]
But since the key error I can't seem to rename it. I tried list(df.columns.values) and see that the result is:
datetime.datetime(2022, 10, 1, 0, 0), datetime.datetime(2022, 11, 1, 0, 0), datetime.datetime(2022, 12, 1, 0, 0)
Is there a way I can convert these datetime headers into string format so I can rename them? I tried various date formats but there might be a pythonic way to handle these headers.
CodePudding user response:
You can convert the datetime
columns to string using strftime
:
df.columns = df.columns.strftime("%Y-%m-%d")
print(df)
Prints:
2022-10-01 2022-11-01 2022-12-01
0 1 2 3
1 4 5 6
Then print(df.columns)
will print:
Index(['2022-10-01', '2022-11-01', '2022-12-01'], dtype='object')
EDIT:
If you have many mixed type columns, you can do:
df = df.rename(lambda c: c if isinstance(c, str) else str(c).rsplit(maxsplit=1)[0], axis=1)
This will check if column type is string, and if not convert it to string, split it from the right side (.rsplit
) once (maxsplit=1
). This will convert the datetime column to only date (discarding the seconds).