Home > Back-end >  Correctly format timestamp in Vaex (Remove colon from UTC offset)
Correctly format timestamp in Vaex (Remove colon from UTC offset)

Time:12-09

I have a dataframe in vaex that I'm having trouble with the timestamp format.

I can't seem to correct format the timestamp column. After researching the problem, I have come to the conclusion that I need to remove the colon in the UTC offset (00:00). But I don't know how to do this.

  • note: I didn't know how to make a sample vaex dataframe, so I made a pandas one and converted it

from datetime import datetime

# Sample pandas dataframe
df = pd.DataFrame({'quantity' : [0,6,4,3,7,8,3,2],
                   'timestamp' : ['2018-05-01 03:05:00 00:00', '2018-05-01 04:15:00 00:00',
                                '2018-06-02 03:15:00 00:00', '2018-06-02 04:25:00 00:00',
                                '2018-07-03 03:25:00 00:00', '2018-07-03 04:35:00 00:00',
                                '2018-08-04 03:35:00 00:00', '2018-08-04 04:45:00 00:00'],
                   'id' : [1,2,3,4,5,6,7,8] 
                   })
# Pandas to vaex
df = vx.from_pandas(df=df, copy_index=True)

# converting string timestamp to datetime
date_format = "%Y-%m-%d %H:%M:%S %z" 

def column_to_datetime(datetime_str):
    return np.datetime64(datetime.strptime(datetime_str, date_format))

df['timestamp_parsed'] = df['timestamp'].apply(column_to_datetime)

but later on, I get the error ValueError: time data '2018-05-01 03:05:00 00:00' does not match format '%Y-%m-%d %H:%M:%S %z'

CodePudding user response:

you can use:

from datetime import datetime

# Sample pandas dataframe
df = pd.DataFrame({'quantity' : [0,6,4,3,7,8,3,2],
                   'timestamp' : ['2018-05-01 03:05:00 00:00', '2018-05-01 04:15:00 00:00',
                                '2018-06-02 03:15:00 00:00', '2018-06-02 04:25:00 00:00',
                                '2018-07-03 03:25:00 00:00', '2018-07-03 04:35:00 00:00',
                                '2018-08-04 03:35:00 00:00', '2018-08-04 04:45:00 00:00'],
                   'id' : [1,2,3,4,5,6,7,8] 
                   })
df['timestamp']=pd.to_datetime(df['timestamp']).dt.tz_localize(None)

# Pandas to vaex
df = vx.from_pandas(df=df, copy_index=True)

Output:

    timestamp
0   2018-05-01 03:05:00
1   2018-05-01 04:15:00
2   2018-06-02 03:15:00
3   2018-06-02 04:25:00
4   2018-07-03 03:25:00
5   2018-07-03 04:35:00
6   2018-08-04 03:35:00
7   2018-08-04 04:45:00


CodePudding user response:

Using your code, try this before converting the string to datetime:

# Remove the colon from the UTC offset
df['timestamp'] = df['timestamp'].str.replace(':', '')
  • Related