I have a dataframe where all the column names are dates. When I use read_csv, pandas reads these column names as strings. Is there a way to specify that I want the column names to be datetime objects. I need this to be done as part of the read_csv call, rather than an additional line afterwards.
For example, the csv file looks something like
df = pd.DataFrame({'2022-10-25': [0.2, 1.5, 1.4], '2022-10-26': [1.1, 2.3, 0.2]})
and when I call read_csv, I want the column headers to be datetime objects, rather than strings.
CodePudding user response:
either you define the whole column as a datetime or as an integer... there is no way you can define the header as a datetime and the values as a different thing.
On the other hand you cannot define your columns as datetime, because the values are just integers... I think you are approaching your problem in a wrong way.
CodePudding user response:
Create the dummy dataset:
df = pd.DataFrame({'2022-10-25': [0, 1, 1], '2022-10-26': [1, 1, 0]})
Check the initial columns index:
df.columns
Output:
You can run the following code just after the .read_csv()
from datetime import datetime
datetime_columns = [datetime.strptime(col, '%Y-%m-%d') for col in df.columns]
df.columns = pd.Index(datetime_columns, dtype='datetime64[ns]')
Finally we check the new index:
df.columns
Output: