I have a dataset with dates encoded as strings formatted as %B %d, %Y, eg September 10, 2021.
Using:df['sale_date'] = pd.to_datetime(df.sale_date, format = '%B %d, %Y')
produces this error ValueError: time data 'September 10, 2021' does not match format '%B %d, %Y' (match)
Manually checking with strptimedatetime.strptime('September 10, 2021', '%B %d, %Y')
produces the correct datetime object.
Is there something I missed in the pd.to_datetime?
Thanks.
CodePudding user response:
Upon further investigation, I found out that the error only happens on the first element of the series. It seems that the string has '\ufeff' added to it. So I just did a series.str.replace()
and now it is working. Sorry for the bother. Question is how did that BOM end up there?
CodePudding user response:
Very likely you have to eliminate some whitespaces first!
If I add whitespaces at the beginning, end or both..
datestring = ' September 10, 2021 '
datetime.datetime.strptime(datestring, '%B %d, %Y')
it will result in the same error message as you have..
ValueError: time data ' September 10, 2021 ' does not match format '%B %d, %Y'
As a solution for a single value use:
datestring = ' September 10, 2021 '
datestring.strip()
for a column in a dataframe use:
dummy = pd.DataFrame(columns={'Date'}, data = [' September 10, 2021 ', ' September 11, 2021 ', ' September 12, 2021 '])
dummy.Date = dummy.Date.apply(lambda x: x.strip())