Home > Software engineering >  Rust Polars from AWS S3?
Rust Polars from AWS S3?

Time:10-17

Polars guide shows example of loading a file from S3. Unfotunately though, it uses python library pyarrow and a function from_arrow which also seems to be python specific.

I wonder if it would be possible to do the same in pure Rust? Or is my best shot to use Python FFI?

Update: Seems like not possible at the moment, but work in progress.

CodePudding user response:

I haven't got the time to look into it but would this help? Both Datafusion and Polars are built on Arrow.

https://crates.io/crates/datafusion-objectstore-s3

CodePudding user response:

While not directly supported, It is achievable by using the aws sdk to read the s3 to an in memory object, then pass that off to the file reader.

use aws_sdk_s3::Region;

use aws_config::meta::region::RegionProviderChain;
use aws_sdk_s3::Client;
use std::borrow::Cow;

use polars::prelude::*;

#[tokio::main]
async fn main() {
    let region = "region";
    let bucket = "bucket";
    let key = "key";
    let region = Region::new(Cow::Borrowed(region));

    let region_provider = RegionProviderChain::default_provider().or_else(region);
    let config = aws_config::from_env().region(region_provider).load().await;
    let client = Client::new(&config);

    let req = client.get_object().bucket(bucket).key(key);

    let res = req.clone().send().await.unwrap();
    let bytes = res.body.collect().await.unwrap();
    let bytes = bytes.into_bytes();

    let cursor = std::io::Cursor::new(bytes);

    let df = CsvReader::new(cursor).finish().unwrap();

    println!("{:?}", df);
}

Cargo.toml

[dependencies]
aws-config = "0.49.0"
aws-sdk-s3 = "0.19.0"
polars = { version = "0.24.3", features = ["lazy", "parquet"] }
tokio = { version = "1.21.2", features = ["full"] }
  • Related