Home > Net >  How append data to Parquet file with save dataframe from Polars
How append data to Parquet file with save dataframe from Polars

Time:12-27

I have Polars df and I want to save it into Parquet file. And next df too, and next. Code df.write_parquet("path.parquet") is only rewriting it. How I can do it in Polars?

CodePudding user response:

Polars does not support appending to Parquet files, and most tools do not, see for example this SO post.

Your best bet would be to cast the dataframe to an Arrow table using .to_arrow(), and use pyarrow.dataset.write_dataset. In particular, see the comment on the parameter existing_data_behavior. Still, that requires organizing your data in partitions, which effectively means you have a separate parquet file per partition, stored in the same directory. So each df you have, becomes its own parquet file, and you abstract away from that on the read. Polars does not support writing partitions as far as I'm aware. There is support for reading though, see the source argument in pl.read_parquet.

  • Related