Home > front end >  Rust Polars: Is it possible to explode a list column into multiple columns?
Rust Polars: Is it possible to explode a list column into multiple columns?

Time:06-04

I have a function which returns a list type column. Hence, one of my columns is a list. I'd like to turn this list column into multiple columns. For example:

use polars::prelude::*;
use polars::df;

fn main() {
    let s0 = Series::new("a", &[1i64, 2, 3]);
    let s1 = Series::new("b", &[1i64, 1, 1]);
    let s2 = Series::new("c", &[Some(2i64), None, None]);
    // construct a new ListChunked for a slice of Series.
    let list = Series::new("foo", &[s0, s1, s2]);

    // construct a few more Series.
    let s0 = Series::new("Group", ["A", "B", "A"]);
    let s1 = Series::new("Cost", [1, 1, 1]);
    let df = DataFrame::new(vec![s0, s1, list]).unwrap();

    dbg!(df);

At this stage DF looks like this:

┌───────┬──────┬─────────────────┐
│ Group ┆ Cost ┆ foo             │
│ ---   ┆ ---  ┆ ---             │
│ str   ┆ i32  ┆ list [i64]      │
╞═══════╪══════╪═════════════════╡
│ A     ┆ 1    ┆ [1, 2, 3]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ B     ┆ 1    ┆ [1, 1, 1]       │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ A     ┆ 1    ┆ [2, null, null] │

Question From here, I'd like to get:

┌───────┬──────┬─────┬──────┬──────┐
│ Group ┆ Cost ┆ a   ┆ b    ┆ c    │
│ ---   ┆ ---  ┆ --- ┆ ---  ┆ ---  │
│ str   ┆ i32  ┆ i64 ┆ i64  ┆ i64  │
╞═══════╪══════╪═════╪══════╪══════╡
│ A     ┆ 1    ┆ 1   ┆ 2    ┆ 3    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ B     ┆ 1    ┆ 1   ┆ 1    ┆ 1    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ A     ┆ 1    ┆ 2   ┆ null ┆ null │

So I need something like .explode() but column-wise orient. Is there an existent funciton for this or a workaround potentially?

Many thanks

CodePudding user response:

Yes you can. Via polars lazy, we get access the to the expression API and we can use the arr() namespace, to get elements by index.

let out = df
    .lazy()
    .select([
        all().exclude(["foo"]),
        col("foo").arr().get(0).alias("a"),
        col("foo").arr().get(1).alias("b"),
        col("foo").arr().get(2).alias("c"),
    ])
    .collect()?;
dbg!(out);
┌───────┬──────┬─────┬──────┬──────┐
│ Group ┆ Cost ┆ a   ┆ b    ┆ c    │
│ ---   ┆ ---  ┆ --- ┆ ---  ┆ ---  │
│ str   ┆ i32  ┆ i64 ┆ i64  ┆ i64  │
╞═══════╪══════╪═════╪══════╪══════╡
│ A     ┆ 1    ┆ 1   ┆ 2    ┆ 3    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ B     ┆ 1    ┆ 1   ┆ 1    ┆ 1    │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ A     ┆ 1    ┆ 2   ┆ null ┆ null │
└───────┴──────┴─────┴──────┴──────┘

  • Related