Home > database >  How to transform a series of a Polars dataframe?
How to transform a series of a Polars dataframe?

Time:07-16

I am dealing with a large dataframe (198,619 rows x 19,110 columns) and so am using the polars package to read in the tsv file. Pandas just takes too long.

However, I now face an issue as I want to transform each cell's value x raising it by base 2 as follows: 2^x.

I run the following line as an example:

df_copy = df
df_copy[:,1] = 2**df[:,1]

But I get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/tmp/pbs.98503.hn-10-03/ipykernel_196334/3484346087.py in <module>
      1 df_copy = df
----> 2 df_copy[:,1] = 2**df[:,1]

~/.local/lib/python3.9/site-packages/polars/internals/frame.py in __setitem__(self, key, value)
   1845 
   1846             # dispatch to __setitem__ of Series to do modification
-> 1847             s[row_selection] = value
   1848 
   1849             # now find the location to place series

~/.local/lib/python3.9/site-packages/polars/internals/series.py in __setitem__(self, key, value)
    512             self.__setitem__([key], value)
    513         else:
--> 514             raise ValueError(f'cannot use "{key}" for indexing')
    515 
    516     def estimated_size(self) -> int:

ValueError: cannot use "slice(None, None, None)" for indexing

This should be simple but I can't figure it out as I'm new to Polars.

CodePudding user response:

The secret to harnessing the speed and flexibility of Polars is to learn to use Expressions. As such, you'll want to avoid Pandas-style indexing methods.

Let's start with this data:

import polars as pl

nbr_rows = 4
nbr_cols = 5
df = pl.DataFrame({
    "col_"   str(col_nbr): pl.arange(col_nbr, nbr_rows   col_nbr, eager=True)
    for col_nbr in range(0, nbr_cols)
})
df
shape: (4, 5)
┌───────┬───────┬───────┬───────┬───────┐
│ col_0 ┆ col_1 ┆ col_2 ┆ col_3 ┆ col_4 │
│ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
│ i64   ┆ i64   ┆ i64   ┆ i64   ┆ i64   │
╞═══════╪═══════╪═══════╪═══════╪═══════╡
│ 0     ┆ 1     ┆ 2     ┆ 3     ┆ 4     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 1     ┆ 2     ┆ 3     ┆ 4     ┆ 5     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2     ┆ 3     ┆ 4     ┆ 5     ┆ 6     │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3     ┆ 4     ┆ 5     ┆ 6     ┆ 7     │
└───────┴───────┴───────┴───────┴───────┘

In Polars we would express your calculations as:

df_copy = df.select(pl.lit(2).pow(pl.all()).keep_name())
print(df_copy)
shape: (4, 5)
┌───────┬───────┬───────┬───────┬───────┐
│ col_0 ┆ col_1 ┆ col_2 ┆ col_3 ┆ col_4 │
│ ---   ┆ ---   ┆ ---   ┆ ---   ┆ ---   │
│ f64   ┆ f64   ┆ f64   ┆ f64   ┆ f64   │
╞═══════╪═══════╪═══════╪═══════╪═══════╡
│ 1.0   ┆ 2.0   ┆ 4.0   ┆ 8.0   ┆ 16.0  │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2.0   ┆ 4.0   ┆ 8.0   ┆ 16.0  ┆ 32.0  │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 4.0   ┆ 8.0   ┆ 16.0  ┆ 32.0  ┆ 64.0  │
├╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 8.0   ┆ 16.0  ┆ 32.0  ┆ 64.0  ┆ 128.0 │
└───────┴───────┴───────┴───────┴───────┘
  • Related