I have the following datetime values in dask dataframe saved as string dates:
ddf = dd.DataFrame({'date': ['15JAN1955:13:15:27.369', NaN,'25DEC1990:23:18:17.200', '06MAY1962:02:55:27.360', NaN, '20SEP1975:12:02:26.357']}
I used ddf['date'].apply(lambda x: datetime.strptime(x,"%d%b%Y:%H:%M:%S.%f"), meta=datetime)
but I get a TypeError: strptime() argument 1 must be a str, not float
error.
I am following the way dates were parsed from the book: Data Science with python and dask.
Is the .%f
expecting a float? Or maybe it has something to do with NaN
values?
CodePudding user response:
You may use %f
that parses any decimal fraction of seconds with up to 6 digits
Also 20SEPT1975
should be 20SEP1975
(no T
in month)
import pandas as pd
import numpy as np
df = pd.DataFrame({'date': ['15JAN1955:13:15:27.369', np.nan,
'25DEC1990:23:18:17.200', np.nan,
'06MAY1962:02:55:27.360', '20SEP1975:12:02:26.357']})
df['date'] = pd.to_datetime(df['date'], format="%d%b%Y:%H:%M:%S.%f")
print(df)
date
0 1955-01-15 13:15:27.369
1 NaT
2 1990-12-25 23:18:17.200
3 NaT
4 1962-05-06 02:55:27.360
5 1975-09-20 12:02:26.357