with python, is there a way to load a polars dataframe directly into an s3 bucket as parquet-CodePudding

looking for something like this:

Save Dataframe to csv directly to s3 Python

the api shows these arguments: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.DataFrame.write_parquet.html

but i'm not sure how to convert the df into a stream...

CodePudding user response：

Untested, since I don't have an AWS account

You could use s3fs.S3File like this:

import polars as pl
import s3fs

fs = s3fs.S3FileSystem(anon=True)  # picks up default credentials
df = pl.DataFrame(
    {
        "foo": [1, 2, 3, 4, 5],
        "bar": [6, 7, 8, 9, 10],
        "ham": ["a", "b", "c", "d", "e"],
    }
)
with fs.open('my-bucket/dataframe-dump.parquet') as f:
    df.write_parquet(f)

Basically s3fs gives you an fsspec conformant file object, which polars knows how to use because write_parquet accepts any regular file or streams.

If you want to manage your S3 connection more granularly, you can construct as S3File object from the botocore connection (see the docs linked above).