The following code is converting any kind of timestamp of dataframe into a given Format.
pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')
How can I do this with "DASK"? I used the below code but it did not work.
(df is dask dataframe)
a=dd.to_datetime(df["time:timestamp"],format='%Y-%m-%d %X')
a.compute()
Error-: ValueError: unconverted data remains: .304000 00:00
this is how timestamp look like-: "2016-01-01 09:51:15.304000 00:00" Expected output -: "2016-01-01 09:51:15"
I found Converting a Dask column into new Dask column of type datetime, but it is not working
Can someone tell me, how to do this with "Dask"
CodePudding user response:
You can truncate the datetime:
# Solution 1
>>> dd.to_datetime(df['time:timestamp'].str[:19]).compute()
0 2016-01-01 09:51:15
dtype: datetime64[ns]
# Solution 2
>>> dd.to_datetime(df['time:timestamp'].str.split('.').str[0]).compute()
0 2016-01-01 09:51:15
dtype: datetime64[ns]
# Solution 3 (@mozway)
>>> dd.to_datetime(df['time:timestamp'].str.replace('\..*', '', regex=True)).compute()
0 2016-01-01 09:51:15
dtype: datetime64[ns]
CodePudding user response:
As you already have the string in the almost correct format, maybe just with with the strings:
df_pd['timestamp'] = df_pd['timestamp'].str.replace(r'\..*', '', regex=True)
Alternatively, if you need to use to_datetime
:
pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')
Or:
pd.to_datetime(df_pd["timestamp"],format='%Y-%m-%d %H:%M:%S.%f%z').dt.strftime('%Y-%m-%d %X')