Home > database >  Pandas datetime inconsistent format
Pandas datetime inconsistent format

Time:08-07

I hope someone can help me with the following: I'm trying to convert my data to daily averages using:

df['timestamp'] = pd.to_datetime(df['Datum WSM-09'])
df_daily_avg = df.groupby(pd.Grouper(freq='D', key='timestamp')).mean()

df['Datum WSM-09'] looks like this:

0          6-3-2020 12:30
1          6-3-2020 12:40
2          6-3-2020 12:50
3          6-3-2020 13:00
4          6-3-2020 13:10
               ...       
106785    18-3-2022 02:00
106786    18-3-2022 02:10
106787    18-3-2022 02:20
106788    18-3-2022 02:30
106789    18-3-2022 02:40
Name: Datum WSM-09, Length: 106790, dtype: object

However, when executing the first line the data under "timestamp" is inconsistent. The last rows displayed in the picture are correct. For the first ones, it should be 2020-03-06 12:30. The month and the day are switched around.

Many thanks

enter image description here

CodePudding user response:

In iso8601

Your source column is apparently object / text. The March 18th timestamps are unambiguous, as there's fewer than 18 months in the year. The ambiguous March 6th timestamps make the hair on the back of the black cat stand on end.

You neglected to specify a timestamp format, given that the source column is ambiguously formatted.

Please RTFM:

format : str, default None

The strftime to parse time, e.g. "%d/%m/%Y". Note that "%f" will parse all the way up to nanoseconds. See strftime documentation for more information on choices.

You tried offering a value of None, which is not a good match to your business needs. I don't know what all of your input data looks like, but perhaps %d-%m-%Y %H:%M would better match your needs.

CodePudding user response:

Try using the "dayfirst" option:

df['timestamp'] = pd.to_datetime(df['Datum WSM-09'], dayfirst=True)
  • Related