Home > Blockchain >  Preserve one column in timestamp format when write pd.to_csv
Preserve one column in timestamp format when write pd.to_csv

Time:11-12

I need to write the following dataframe to csv:

prodid    value           timestamp  
322         4           2021-11-03 17:32:12.252   
123        34           2021-11-03 18:42:12.602
...

or

import datetime

df = pd.DataFrame([(0, 4, datetime.datetime.strptime("2021-11-03 14:32:13.4)02", "%Y-%m-%d %H:%M:%S.%f"), 
                   (1, 34, datetime.datetime.strptime("2021-11-03 18:42:12.602", "%Y-%m-%d %H:%M:%S.%f"))],
                   columns=['prodid', 'value', 'timestamp'])

When I write this to csv in a StringIO buffer:

import io

io_buff = io.StringIO()
df.to_csv(io_buff, sep='\t', header=False, index=False)

column timestamp is no longer preserved as datetime. When I further process this io_buff, such as writing it to a Postgres table, column timestamp is written as numeric.

How do I preserve the timestamp format when writing it with pd.to_csv()?

CodePudding user response:

The csv file format can't preserve any types, as it is just a bunch of comma separated strings. It's up to the consumer of the csv file to parse it to desired types. Avoid it.

If you just want to store data with pandas and read with pandas again, use parquet or feather, this way you will keep your types (and save 90% of disk space, and get 10x faster write / read).

If you want to write data to postgres use df.to_sql(..., method='multi'). If that's not fast enough and you are doing some crazy COPY magic, then follow this answer https://stackoverflow.com/a/55495065/6110160 to the letter and add more details to your question if it still fails.

  • Related