Home > Blockchain >  to_datetime - max function returns the wrong max date
to_datetime - max function returns the wrong max date

Time:10-01

I have data which it comes from a csv file, and I am trying to get the max date.

Data:

0    01/01/1994
1    01/01/1994
2    01/01/1994
3    01/01/1994
4    01/01/1994
.
.
.
970075    31/08/2021
970076    31/08/2021
970077    31/08/2021
970078    31/08/2021
970079    31/08/2021

However, I get the wrong max value. It seems that my code sets as string my date column, and not as date format, even though I set to_datetime. Because of that, I use re on that string to get the year.

My code:

file['Date'] = pd.to_datetime(file['Date'], errors = 'coerce',
                dayfirst = True, format = '%d.%m.%Y'
                ).dt.strftime('%d/%m/%Y')


print(file['Date'].min(), file['Date'].max(), range(int(re.search(r'(\d{4})', file['Date'].min()).group()), int(re.search(r'(\d{4})', file['Date'].max()).group())))

Returns:

01/01/1994 31/12/2020 range(1994, 2020)

I would like to get the max 31/08/2021 and not 31/12/2020.

CodePudding user response:

Remove .dt.strftime for converting datetimes to strings repr.

.dt.strftime('%d/%m/%Y')

You can convert to custom format after min and max.

All together, also simplify for get maximal and minimal years:

file['Date'] = pd.to_datetime(file['Date'], errors = 'coerce', dayfirst = True)
years = file['Date'].dt.year

print(file['Date'].min().strftime('%d/%m/%Y'), 
      file['Date'].max().strftime('%d/%m/%Y'), 
      range(years.min(), years.max()))

01/01/1994 31/08/2021 range(1994, 2021)
  • Related