I need to write the following dataframe to csv:
prodid value timestamp
322 4 2021-11-03 17:32:12.252
123 34 2021-11-03 18:42:12.602
...
or
import datetime
df = pd.DataFrame([(0, 4, datetime.datetime.strptime("2021-11-03 14:32:13.4)02", "%Y-%m-%d %H:%M:%S.%f"),
(1, 34, datetime.datetime.strptime("2021-11-03 18:42:12.602", "%Y-%m-%d %H:%M:%S.%f"))],
columns=['prodid', 'value', 'timestamp'])
When I write this to csv in a StringIO
buffer:
import io
io_buff = io.StringIO()
df.to_csv(io_buff, sep='\t', header=False, index=False)
column timestamp
is no longer preserved as datetime
. When I further process this io_buff
, such as writing it to a Postgres table, column timestamp
is written as numeric
.
How do I preserve the timestamp format when writing it with pd.to_csv()
?
CodePudding user response:
The csv file format can't preserve any types, as it is just a bunch of comma separated strings. It's up to the consumer of the csv file to parse it to desired types. Avoid it.
If you just want to store data with pandas and read with pandas again, use parquet or feather, this way you will keep your types (and save 90% of disk space, and get 10x faster write / read).
If you want to write data to postgres use df.to_sql(..., method='multi')
. If that's not fast enough and you are doing some crazy COPY
magic, then follow this answer https://stackoverflow.com/a/55495065/6110160 to the letter and add more details to your question if it still fails.