Home > front end >  (Polars) How to get element from a column with list by index specified in another column
(Polars) How to get element from a column with list by index specified in another column

Time:10-26

I have a dataframe with 2 columns, where first column contains lists, and second column integer indexes. How to get elements from first column by index specified in second column? Or even better, put that element in 3rd column. So for example, how from this

a = pl.DataFrame([{'lst': [1, 2, 3], 'ind': 1}, {'lst': [4, 5, 6], 'ind': 2}])
┌───────────┬─────┐
│ lst       ┆ ind │
│ ---       ┆ --- │
│ list[i64] ┆ i64 │
╞═══════════╪═════╡
│ [1, 2, 3] ┆ 1   │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ [4, 5, 6] ┆ 2   │
└───────────┴─────┘

you can get this

b = pl.DataFrame([{'lst': [1, 2, 3], 'ind': 1, 'list[ind]': 2}, {'lst': [4, 5, 6], 'ind': 2, 'list[ind]': 6}])
┌───────────┬─────┬───────────┐
│ lst       ┆ ind ┆ list[ind] │
│ ---       ┆ --- ┆ ---       │
│ list[i64] ┆ i64 ┆ i64       │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2         │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ [4, 5, 6] ┆ 2   ┆ 6         │
└───────────┴─────┴───────────┘

Thanks.

CodePudding user response:

You can use with_row_count() to add a row count column for grouping, then explode() the list so each list element is on each row. Then call take() over the row count column using over() to select the element from the subgroup.

df = pl.DataFrame({"lst": [[1, 2, 3], [4, 5, 6]], "ind": [1, 2]})

df = (
    df.with_row_count()
    .with_column(
        pl.col("lst").explode().take(pl.col("ind")).over(pl.col("row_nr")).alias("list[ind]")
    )
    .drop("row_nr")
)
shape: (2, 3)
┌───────────┬─────┬───────────┐
│ lst       ┆ ind ┆ list[ind] │
│ ---       ┆ --- ┆ ---       │
│ list[i64] ┆ i64 ┆ i64       │
╞═══════════╪═════╪═══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2         │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ [4, 5, 6] ┆ 2   ┆ 6         │
└───────────┴─────┴───────────┘

CodePudding user response:

Here is my approach:

Create a custom function to get the values as per the required index.

def get_elem(d):
    sel_idx = d[0]
    return d[1][sel_idx]

here is a test data.

df = pl.DataFrame({'lista':[[1,2,3],[4,5,6]],'idx':[1,2]})

Now lets create a struct on these two columns(it will create a dict) and apply an above function

df.with_columns([
    pl.struct(['idx','lista']).apply(lambda x: get_elem(list(x.values()))).alias('req_elem')])
shape: (2, 3)
┌───────────┬─────┬──────────┐
│ lista     ┆ idx ┆ req_elem │
│ ---       ┆ --- ┆ ---      │
│ list[i64] ┆ i64 ┆ i64      │
╞═══════════╪═════╪══════════╡
│ [1, 2, 3] ┆ 1   ┆ 2        │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ [4, 5, 6] ┆ 2   ┆ 6        │
└───────────┴─────┴──────────┘
  • Related