Home > Software design >  Why I can't parse timestamp in pyarrow?
Why I can't parse timestamp in pyarrow?

Time:10-21

I have a JSON file with that variable:

"BirthDate":"2022-09-05T08:08:46.000 00:00"

And I want to create parquet based on that file. I prepared fixed schema for pyarrow where BirthDate is a pa.timestamp('s'). And when I trying to convert that file I got error:

ERROR:root:Failed of conversion of JSON to timestamp[s], couldn't parse:2022-09-05T08:08:46.000 00:00

My pyarrow code:

parquet_file = pyarrow_json.read_json(json_file, parse_options=pyarrow_json.ParseOptions(
                explicit_schema=prepared_schema,
                unexpected_field_behavior='ignore'))

I have also some files with different types of timestamp (for example without that " ") and it's work fine then.

How can I convert it, and where is a problem with this specific type?

CodePudding user response:

It works for me using pa.field("BirthDate", pa.timestamp('ms')).

I think it's because your timestamps have got millisecond precision (even though they have their milliseconds set to zero)


import pyarrow.json as pyarrow_json
import pyarrow as pa

prepared_schema = pa.schema([pa.field("BirthDate", pa.timestamp('ms'))])

parquet_file = pyarrow_json.read_json(
    json_file,
    parse_options=pyarrow_json.ParseOptions(
        explicit_schema=prepared_schema,
        unexpected_field_behavior='ignore')
)
  • Related