I have a dataframe in vaex that I'm having trouble with the timestamp format.
I can't seem to correct format the timestamp column. After researching the problem, I have come to the conclusion that I need to remove the colon in the UTC offset (00:00). But I don't know how to do this.
- note: I didn't know how to make a sample vaex dataframe, so I made a pandas one and converted it
from datetime import datetime
# Sample pandas dataframe
df = pd.DataFrame({'quantity' : [0,6,4,3,7,8,3,2],
'timestamp' : ['2018-05-01 03:05:00 00:00', '2018-05-01 04:15:00 00:00',
'2018-06-02 03:15:00 00:00', '2018-06-02 04:25:00 00:00',
'2018-07-03 03:25:00 00:00', '2018-07-03 04:35:00 00:00',
'2018-08-04 03:35:00 00:00', '2018-08-04 04:45:00 00:00'],
'id' : [1,2,3,4,5,6,7,8]
})
# Pandas to vaex
df = vx.from_pandas(df=df, copy_index=True)
# converting string timestamp to datetime
date_format = "%Y-%m-%d %H:%M:%S %z"
def column_to_datetime(datetime_str):
return np.datetime64(datetime.strptime(datetime_str, date_format))
df['timestamp_parsed'] = df['timestamp'].apply(column_to_datetime)
but later on, I get the error ValueError: time data '2018-05-01 03:05:00 00:00' does not match format '%Y-%m-%d %H:%M:%S %z'
CodePudding user response:
you can use:
from datetime import datetime
# Sample pandas dataframe
df = pd.DataFrame({'quantity' : [0,6,4,3,7,8,3,2],
'timestamp' : ['2018-05-01 03:05:00 00:00', '2018-05-01 04:15:00 00:00',
'2018-06-02 03:15:00 00:00', '2018-06-02 04:25:00 00:00',
'2018-07-03 03:25:00 00:00', '2018-07-03 04:35:00 00:00',
'2018-08-04 03:35:00 00:00', '2018-08-04 04:45:00 00:00'],
'id' : [1,2,3,4,5,6,7,8]
})
df['timestamp']=pd.to_datetime(df['timestamp']).dt.tz_localize(None)
# Pandas to vaex
df = vx.from_pandas(df=df, copy_index=True)
Output:
timestamp
0 2018-05-01 03:05:00
1 2018-05-01 04:15:00
2 2018-06-02 03:15:00
3 2018-06-02 04:25:00
4 2018-07-03 03:25:00
5 2018-07-03 04:35:00
6 2018-08-04 03:35:00
7 2018-08-04 04:45:00
CodePudding user response:
Using your code, try this before converting the string to datetime:
# Remove the colon from the UTC offset
df['timestamp'] = df['timestamp'].str.replace(':', '')