i'm beginner in pyarrow and trying to cast my timestamp with AM/PM prefix.
I have a column ['Datetime'] with such values:
"2021/07/25 12:00:00 AM",
"2022/06/28 11:58:00 PM",
"2022/03/11 10:30:00 AM",
and i'm trying to get these:
2021-07-25 12:00:00,
2022-06-28 11:58:00,
2022-03-11 10:30:00,
Ideally, want make this transformation in pyarrow.csv.read_csv something like that:
table = csv.read_csv('my_data.csv',
convert_options=csv.ConvertOptions(
column_types={
'Datetime': pa.timestamp[s],
}
)
)
and after that write this table to parquet.
At the same time i know how to convert this separate from my table
pc.strptime(table.column("Incident Datetime"), format='%Y/%m/%d %H:%M:%S %p', unit='s')
But i don't understand how to cast this changes to my table.
CodePudding user response:
Right now you can do that with set_column
method. See cookbook here: https://arrow.apache.org/cookbook/py/data.html#replacing-a-column-in-an-existing-table.
new_incident_datetime = pc.strptime(table.column("Incident Datetime"), format='%Y/%m/%d %H:%M:%S %p', unit='s')
column_idx = 1 # Or whatever your column index happens to be.
sales_data.set_column(
column_idx,
"Incident Datetime",
pa.array([30, 20, 15, 40])
)