I've have dataframe with column b with list elements, I need to create column c that counts number elements in list for every row. Here is toy example in Pandas:
import pandas as pd
df = pd.DataFrame({'a': [1,2,3], 'b':[[1,2,3], [2], [5,0]]})
a b
0 1 [1, 2, 3]
1 2 [2]
2 3 [5, 0]
df.assign(c=df['b'].str.len())
a b c
0 1 [1, 2, 3] 3
1 2 [2] 1
2 3 [5, 0] 2
Here is my equivalent in Polars:
import polars as pl
dfp = pl.DataFrame({'a': [1,2,3], 'b':[[1,2,3], [2], [5,0]]})
dfp.with_columns(pl.col('b').apply(lambda x: len(x)).alias('c'))
I've a feeling that .apply(lambda x: len(x))
is not optimal.
Is a better way to do it in Polars?
CodePudding user response:
You can use .arr
to access the list functions -- in this case .lengths()
>>> df.with_column(pl.col("b").arr.lengths().alias("c"))
shape: (3, 3)
┌─────┬───────────┬─────┐
│ a | b | c │
│ --- | --- | --- │
│ i64 | list[i64] | u32 │
╞═════╪═══════════╪═════╡
│ 1 | [1, 2, 3] | 3 │
├─────┼───────────┼─────┤
│ 2 | [2] | 1 │
├─────┼───────────┼─────┤
│ 3 | [5, 0] | 2 │
└─//──┴─//────────┴─//──┘