Home > Blockchain >  converting Timestamp into proper format with DASK in python
converting Timestamp into proper format with DASK in python

Time:07-08

The following code is converting any kind of timestamp of dataframe into a given Format.

pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')

How can I do this with "DASK"? I used the below code but it did not work.

(df is dask dataframe)

a=dd.to_datetime(df["time:timestamp"],format='%Y-%m-%d %X')
a.compute()

Error-: ValueError: unconverted data remains: .304000 00:00

this is how timestamp look like-: "2016-01-01 09:51:15.304000 00:00" Expected output -: "2016-01-01 09:51:15"

I found Converting a Dask column into new Dask column of type datetime, but it is not working

Can someone tell me, how to do this with "Dask"

CodePudding user response:

You can truncate the datetime:

# Solution 1
>>> dd.to_datetime(df['time:timestamp'].str[:19]).compute()
0   2016-01-01 09:51:15
dtype: datetime64[ns]


# Solution 2
>>> dd.to_datetime(df['time:timestamp'].str.split('.').str[0]).compute()
0   2016-01-01 09:51:15
dtype: datetime64[ns]


# Solution 3 (@mozway)
>>> dd.to_datetime(df['time:timestamp'].str.replace('\..*', '', regex=True)).compute()
0   2016-01-01 09:51:15
dtype: datetime64[ns]

CodePudding user response:

As you already have the string in the almost correct format, maybe just with with the strings:

df_pd['timestamp'] = df_pd['timestamp'].str.replace(r'\..*', '', regex=True)

Alternatively, if you need to use to_datetime:

pd.to_datetime(df_pd["timestamp"]).dt.strftime('%Y-%m-%d %X')

Or:

pd.to_datetime(df_pd["timestamp"],format='%Y-%m-%d %H:%M:%S.%f%z').dt.strftime('%Y-%m-%d %X')
  • Related