I have a JSON file with that variable:
"BirthDate":"2022-09-05T08:08:46.000 00:00"
And I want to create parquet based on that file. I prepared fixed schema for pyarrow where BirthDate is a pa.timestamp('s')
. And when I trying to convert that file I got error:
ERROR:root:Failed of conversion of JSON to timestamp[s], couldn't parse:2022-09-05T08:08:46.000 00:00
My pyarrow code:
parquet_file = pyarrow_json.read_json(json_file, parse_options=pyarrow_json.ParseOptions(
explicit_schema=prepared_schema,
unexpected_field_behavior='ignore'))
I have also some files with different types of timestamp (for example without that " ") and it's work fine then.
How can I convert it, and where is a problem with this specific type?
CodePudding user response:
It works for me using pa.field("BirthDate", pa.timestamp('ms'))
.
I think it's because your timestamps have got millisecond precision (even though they have their milliseconds set to zero)
import pyarrow.json as pyarrow_json
import pyarrow as pa
prepared_schema = pa.schema([pa.field("BirthDate", pa.timestamp('ms'))])
parquet_file = pyarrow_json.read_json(
json_file,
parse_options=pyarrow_json.ParseOptions(
explicit_schema=prepared_schema,
unexpected_field_behavior='ignore')
)