looking for something like this:
Save Dataframe to csv directly to s3 Python
the api shows these arguments: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.DataFrame.write_parquet.html
but i'm not sure how to convert the df into a stream...
CodePudding user response:
Untested, since I don't have an AWS account
You could use s3fs.S3File
like this:
import polars as pl
import s3fs
fs = s3fs.S3FileSystem(anon=True) # picks up default credentials
df = pl.DataFrame(
{
"foo": [1, 2, 3, 4, 5],
"bar": [6, 7, 8, 9, 10],
"ham": ["a", "b", "c", "d", "e"],
}
)
with fs.open('my-bucket/dataframe-dump.parquet') as f:
df.write_parquet(f)
Basically s3fs
gives you an fsspec
conformant file object, which polars knows how to use because write_parquet
accepts any regular file or streams.
If you want to manage your S3 connection more granularly, you can construct as S3File
object from the botocore
connection (see the docs linked above).