Home > database >  Why does pd.to_datetime not take the year into account?
Why does pd.to_datetime not take the year into account?

Time:01-13

I've searched for 2 hours but can't find an answer for this that works. I have this dataset I'm working with and I'm trying to find the latest date, but it seems like my code is not taking the year into account. Here are some of the dates that I have in the dataset.

Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022

Here's a snippet from my code

import pandas as pd

df=pd.read_csv('test.csv')

df['Date'] = pd.to_datetime(df['Date'])

st.write(df['Date'].max())

st.write gives me 12/21/2022 as the output instead of 01/09/2023 as it should be. So it seems like the code is not taking the year into account and just looking at the month and date.

I tried changing the format to df['Date'] = df['Date'].dt.strftime('%Y%m%d').astype(int) but that didn't change anything.

CodePudding user response:

pandas.read_csv allows you to designate column for conversion into dates, let test.csv content be

Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022

then

import pandas as pd
df = pd.read_csv('test.csv', parse_dates=["Date"])
print(df['Date'].max())

gives output

2023-01-09 00:00:00

Explanation: I provide list of names of columns holding dates, which then read_csv parses.

(tested in pandas 1.5.2)

  • Related